0001 ========================
0002 SoundWire Error Handling
0003 ========================
0004
0005 The SoundWire PHY was designed with care and errors on the bus are going to
0006 be very unlikely, and if they happen it should be limited to single bit
0007 errors. Examples of this design can be found in the synchronization
0008 mechanism (sync loss after two errors) and short CRCs used for the Bulk
0009 Register Access.
0010
0011 The errors can be detected with multiple mechanisms:
0012
0013 1. Bus clash or parity errors: This mechanism relies on low-level detectors
0014 that are independent of the payload and usages, and they cover both control
0015 and audio data. The current implementation only logs such errors.
0016 Improvements could be invalidating an entire programming sequence and
0017 restarting from a known position. In the case of such errors outside of a
0018 control/command sequence, there is no concealment or recovery for audio
0019 data enabled by the SoundWire protocol, the location of the error will also
0020 impact its audibility (most-significant bits will be more impacted in PCM),
0021 and after a number of such errors are detected the bus might be reset. Note
0022 that bus clashes due to programming errors (two streams using the same bit
0023 slots) or electrical issues during the transmit/receive transition cannot
0024 be distinguished, although a recurring bus clash when audio is enabled is a
0025 indication of a bus allocation issue. The interrupt mechanism can also help
0026 identify Slaves which detected a Bus Clash or a Parity Error, but they may
0027 not be responsible for the errors so resetting them individually is not a
0028 viable recovery strategy.
0029
0030 2. Command status: Each command is associated with a status, which only
0031 covers transmission of the data between devices. The ACK status indicates
0032 that the command was received and will be executed by the end of the
0033 current frame. A NAK indicates that the command was in error and will not
0034 be applied. In case of a bad programming (command sent to non-existent
0035 Slave or to a non-implemented register) or electrical issue, no response
0036 signals the command was ignored. Some Master implementations allow for a
0037 command to be retransmitted several times. If the retransmission fails,
0038 backtracking and restarting the entire programming sequence might be a
0039 solution. Alternatively some implementations might directly issue a bus
0040 reset and re-enumerate all devices.
0041
0042 3. Timeouts: In a number of cases such as ChannelPrepare or
0043 ClockStopPrepare, the bus driver is supposed to poll a register field until
0044 it transitions to a NotFinished value of zero. The MIPI SoundWire spec 1.1
0045 does not define timeouts but the MIPI SoundWire DisCo document adds
0046 recommendation on timeouts. If such configurations do not complete, the
0047 driver will return a -ETIMEOUT. Such timeouts are symptoms of a faulty
0048 Slave device and are likely impossible to recover from.
0049
0050 Errors during global reconfiguration sequences are extremely difficult to
0051 handle:
0052
0053 1. BankSwitch: An error during the last command issuing a BankSwitch is
0054 difficult to backtrack from. Retransmitting the Bank Switch command may be
0055 possible in a single segment setup, but this can lead to synchronization
0056 problems when enabling multiple bus segments (a command with side effects
0057 such as frame reconfiguration would be handled at different times). A global
0058 hard-reset might be the best solution.
0059
0060 Note that SoundWire does not provide a mechanism to detect illegal values
0061 written in valid registers. In a number of cases the standard even mentions
0062 that the Slave might behave in implementation-defined ways. The bus
0063 implementation does not provide a recovery mechanism for such errors, Slave
0064 or Master driver implementers are responsible for writing valid values in
0065 valid registers and implement additional range checking if needed.