0001 .. SPDX-License-Identifier: GPL-2.0
0002
0003 =======
0004 SCSI EH
0005 =======
0006
0007 This document describes SCSI midlayer error handling infrastructure.
0008 Please refer to Documentation/scsi/scsi_mid_low_api.rst for more
0009 information regarding SCSI midlayer.
0010
0011 .. TABLE OF CONTENTS
0012
0013 [1] How SCSI commands travel through the midlayer and to EH
0014 [1-1] struct scsi_cmnd
0015 [1-2] How do scmd's get completed?
0016 [1-2-1] Completing a scmd w/ scsi_done
0017 [1-2-2] Completing a scmd w/ timeout
0018 [1-3] How EH takes over
0019 [2] How SCSI EH works
0020 [2-1] EH through fine-grained callbacks
0021 [2-1-1] Overview
0022 [2-1-2] Flow of scmds through EH
0023 [2-1-3] Flow of control
0024 [2-2] EH through transportt->eh_strategy_handler()
0025 [2-2-1] Pre transportt->eh_strategy_handler() SCSI midlayer conditions
0026 [2-2-2] Post transportt->eh_strategy_handler() SCSI midlayer conditions
0027 [2-2-3] Things to consider
0028
0029
0030 1. How SCSI commands travel through the midlayer and to EH
0031 ==========================================================
0032
0033 1.1 struct scsi_cmnd
0034 --------------------
0035
0036 Each SCSI command is represented with struct scsi_cmnd (== scmd). A
0037 scmd has two list_head's to link itself into lists. The two are
0038 scmd->list and scmd->eh_entry. The former is used for free list or
0039 per-device allocated scmd list and not of much interest to this EH
0040 discussion. The latter is used for completion and EH lists and unless
0041 otherwise stated scmds are always linked using scmd->eh_entry in this
0042 discussion.
0043
0044
0045 1.2 How do scmd's get completed?
0046 --------------------------------
0047
0048 Once LLDD gets hold of a scmd, either the LLDD will complete the
0049 command by calling scsi_done callback passed from midlayer when
0050 invoking hostt->queuecommand() or the block layer will time it out.
0051
0052
0053 1.2.1 Completing a scmd w/ scsi_done
0054 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
0055
0056 For all non-EH commands, scsi_done() is the completion callback. It
0057 just calls blk_complete_request() to delete the block layer timer and
0058 raise SCSI_SOFTIRQ
0059
0060 SCSI_SOFTIRQ handler scsi_softirq calls scsi_decide_disposition() to
0061 determine what to do with the command. scsi_decide_disposition()
0062 looks at the scmd->result value and sense data to determine what to do
0063 with the command.
0064
0065 - SUCCESS
0066
0067 scsi_finish_command() is invoked for the command. The
0068 function does some maintenance chores and then calls
0069 scsi_io_completion() to finish the I/O.
0070 scsi_io_completion() then notifies the block layer on
0071 the completed request by calling blk_end_request and
0072 friends or figures out what to do with the remainder
0073 of the data in case of an error.
0074
0075 - NEEDS_RETRY
0076
0077 - ADD_TO_MLQUEUE
0078
0079 scmd is requeued to blk queue.
0080
0081 - otherwise
0082
0083 scsi_eh_scmd_add(scmd) is invoked for the command. See
0084 [1-3] for details of this function.
0085
0086
0087 1.2.2 Completing a scmd w/ timeout
0088 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
0089
0090 The timeout handler is scsi_timeout(). When a timeout occurs, this function
0091
0092 1. invokes optional hostt->eh_timed_out() callback. Return value can
0093 be one of
0094
0095 - BLK_EH_RESET_TIMER
0096 This indicates that more time is required to finish the
0097 command. Timer is restarted.
0098
0099 - BLK_EH_DONE
0100 eh_timed_out() callback did not handle the command.
0101 Step #2 is taken.
0102
0103 2. scsi_abort_command() is invoked to schedule an asynchronous abort which may
0104 issue a retry scmd->allowed + 1 times. Asynchronous aborts are not invoked
0105 for commands for which the SCSI_EH_ABORT_SCHEDULED flag is set (this
0106 indicates that the command already had been aborted once, and this is a
0107 retry which failed), when retries are exceeded, or when the EH deadline is
0108 expired. In these cases Step #3 is taken.
0109
0110 3. scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD) is invoked for the
0111 command. See [1-4] for more information.
0112
0113 1.3 Asynchronous command aborts
0114 -------------------------------
0115
0116 After a timeout occurs a command abort is scheduled from
0117 scsi_abort_command(). If the abort is successful the command
0118 will either be retried (if the number of retries is not exhausted)
0119 or terminated with DID_TIME_OUT.
0120
0121 Otherwise scsi_eh_scmd_add() is invoked for the command.
0122 See [1-4] for more information.
0123
0124 1.4 How EH takes over
0125 ---------------------
0126
0127 scmds enter EH via scsi_eh_scmd_add(), which does the following.
0128
0129 1. Links scmd->eh_entry to shost->eh_cmd_q
0130
0131 2. Sets SHOST_RECOVERY bit in shost->shost_state
0132
0133 3. Increments shost->host_failed
0134
0135 4. Wakes up SCSI EH thread if shost->host_busy == shost->host_failed
0136
0137 As can be seen above, once any scmd is added to shost->eh_cmd_q,
0138 SHOST_RECOVERY shost_state bit is turned on. This prevents any new
0139 scmd to be issued from blk queue to the host; eventually, all scmds on
0140 the host either complete normally, fail and get added to eh_cmd_q, or
0141 time out and get added to shost->eh_cmd_q.
0142
0143 If all scmds either complete or fail, the number of in-flight scmds
0144 becomes equal to the number of failed scmds - i.e. shost->host_busy ==
0145 shost->host_failed. This wakes up SCSI EH thread. So, once woken up,
0146 SCSI EH thread can expect that all in-flight commands have failed and
0147 are linked on shost->eh_cmd_q.
0148
0149 Note that this does not mean lower layers are quiescent. If a LLDD
0150 completed a scmd with error status, the LLDD and lower layers are
0151 assumed to forget about the scmd at that point. However, if a scmd
0152 has timed out, unless hostt->eh_timed_out() made lower layers forget
0153 about the scmd, which currently no LLDD does, the command is still
0154 active as long as lower layers are concerned and completion could
0155 occur at any time. Of course, all such completions are ignored as the
0156 timer has already expired.
0157
0158 We'll talk about how SCSI EH takes actions to abort - make LLDD
0159 forget about - timed out scmds later.
0160
0161
0162 2. How SCSI EH works
0163 ====================
0164
0165 LLDD's can implement SCSI EH actions in one of the following two
0166 ways.
0167
0168 - Fine-grained EH callbacks
0169 LLDD can implement fine-grained EH callbacks and let SCSI
0170 midlayer drive error handling and call appropriate callbacks.
0171 This will be discussed further in [2-1].
0172
0173 - eh_strategy_handler() callback
0174 This is one big callback which should perform whole error
0175 handling. As such, it should do all chores the SCSI midlayer
0176 performs during recovery. This will be discussed in [2-2].
0177
0178 Once recovery is complete, SCSI EH resumes normal operation by
0179 calling scsi_restart_operations(), which
0180
0181 1. Checks if door locking is needed and locks door.
0182
0183 2. Clears SHOST_RECOVERY shost_state bit
0184
0185 3. Wakes up waiters on shost->host_wait. This occurs if someone
0186 calls scsi_block_when_processing_errors() on the host.
0187 (*QUESTION* why is it needed? All operations will be blocked
0188 anyway after it reaches blk queue.)
0189
0190 4. Kicks queues in all devices on the host in the asses
0191
0192
0193 2.1 EH through fine-grained callbacks
0194 -------------------------------------
0195
0196 2.1.1 Overview
0197 ^^^^^^^^^^^^^^
0198
0199 If eh_strategy_handler() is not present, SCSI midlayer takes charge
0200 of driving error handling. EH's goals are two - make LLDD, host and
0201 device forget about timed out scmds and make them ready for new
0202 commands. A scmd is said to be recovered if the scmd is forgotten by
0203 lower layers and lower layers are ready to process or fail the scmd
0204 again.
0205
0206 To achieve these goals, EH performs recovery actions with increasing
0207 severity. Some actions are performed by issuing SCSI commands and
0208 others are performed by invoking one of the following fine-grained
0209 hostt EH callbacks. Callbacks may be omitted and omitted ones are
0210 considered to fail always.
0211
0212 ::
0213
0214 int (* eh_abort_handler)(struct scsi_cmnd *);
0215 int (* eh_device_reset_handler)(struct scsi_cmnd *);
0216 int (* eh_bus_reset_handler)(struct scsi_cmnd *);
0217 int (* eh_host_reset_handler)(struct scsi_cmnd *);
0218
0219 Higher-severity actions are taken only when lower-severity actions
0220 cannot recover some of failed scmds. Also, note that failure of the
0221 highest-severity action means EH failure and results in offlining of
0222 all unrecovered devices.
0223
0224 During recovery, the following rules are followed
0225
0226 - Recovery actions are performed on failed scmds on the to do list,
0227 eh_work_q. If a recovery action succeeds for a scmd, recovered
0228 scmds are removed from eh_work_q.
0229
0230 Note that single recovery action on a scmd can recover multiple
0231 scmds. e.g. resetting a device recovers all failed scmds on the
0232 device.
0233
0234 - Higher severity actions are taken iff eh_work_q is not empty after
0235 lower severity actions are complete.
0236
0237 - EH reuses failed scmds to issue commands for recovery. For
0238 timed-out scmds, SCSI EH ensures that LLDD forgets about a scmd
0239 before reusing it for EH commands.
0240
0241 When a scmd is recovered, the scmd is moved from eh_work_q to EH
0242 local eh_done_q using scsi_eh_finish_cmd(). After all scmds are
0243 recovered (eh_work_q is empty), scsi_eh_flush_done_q() is invoked to
0244 either retry or error-finish (notify upper layer of failure) recovered
0245 scmds.
0246
0247 scmds are retried iff its sdev is still online (not offlined during
0248 EH), REQ_FAILFAST is not set and ++scmd->retries is less than
0249 scmd->allowed.
0250
0251
0252 2.1.2 Flow of scmds through EH
0253 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
0254
0255 1. Error completion / time out
0256
0257 :ACTION: scsi_eh_scmd_add() is invoked for scmd
0258
0259 - add scmd to shost->eh_cmd_q
0260 - set SHOST_RECOVERY
0261 - shost->host_failed++
0262
0263 :LOCKING: shost->host_lock
0264
0265 2. EH starts
0266
0267 :ACTION: move all scmds to EH's local eh_work_q. shost->eh_cmd_q
0268 is cleared.
0269
0270 :LOCKING: shost->host_lock (not strictly necessary, just for
0271 consistency)
0272
0273 3. scmd recovered
0274
0275 :ACTION: scsi_eh_finish_cmd() is invoked to EH-finish scmd
0276
0277 - scsi_setup_cmd_retry()
0278 - move from local eh_work_q to local eh_done_q
0279
0280 :LOCKING: none
0281
0282 :CONCURRENCY: at most one thread per separate eh_work_q to
0283 keep queue manipulation lockless
0284
0285 4. EH completes
0286
0287 :ACTION: scsi_eh_flush_done_q() retries scmds or notifies upper
0288 layer of failure. May be called concurrently but must have
0289 a no more than one thread per separate eh_work_q to
0290 manipulate the queue locklessly
0291
0292 - scmd is removed from eh_done_q and scmd->eh_entry is cleared
0293 - if retry is necessary, scmd is requeued using
0294 scsi_queue_insert()
0295 - otherwise, scsi_finish_command() is invoked for scmd
0296 - zero shost->host_failed
0297
0298 :LOCKING: queue or finish function performs appropriate locking
0299
0300
0301 2.1.3 Flow of control
0302 ^^^^^^^^^^^^^^^^^^^^^^
0303
0304 EH through fine-grained callbacks start from scsi_unjam_host().
0305
0306 ``scsi_unjam_host``
0307
0308 1. Lock shost->host_lock, splice_init shost->eh_cmd_q into local
0309 eh_work_q and unlock host_lock. Note that shost->eh_cmd_q is
0310 cleared by this action.
0311
0312 2. Invoke scsi_eh_get_sense.
0313
0314 ``scsi_eh_get_sense``
0315
0316 This action is taken for each error-completed
0317 (!SCSI_EH_CANCEL_CMD) commands without valid sense data. Most
0318 SCSI transports/LLDDs automatically acquire sense data on
0319 command failures (autosense). Autosense is recommended for
0320 performance reasons and as sense information could get out of
0321 sync between occurrence of CHECK CONDITION and this action.
0322
0323 Note that if autosense is not supported, scmd->sense_buffer
0324 contains invalid sense data when error-completing the scmd
0325 with scsi_done(). scsi_decide_disposition() always returns
0326 FAILED in such cases thus invoking SCSI EH. When the scmd
0327 reaches here, sense data is acquired and
0328 scsi_decide_disposition() is called again.
0329
0330 1. Invoke scsi_request_sense() which issues REQUEST_SENSE
0331 command. If fails, no action. Note that taking no action
0332 causes higher-severity recovery to be taken for the scmd.
0333
0334 2. Invoke scsi_decide_disposition() on the scmd
0335
0336 - SUCCESS
0337 scmd->retries is set to scmd->allowed preventing
0338 scsi_eh_flush_done_q() from retrying the scmd and
0339 scsi_eh_finish_cmd() is invoked.
0340
0341 - NEEDS_RETRY
0342 scsi_eh_finish_cmd() invoked
0343
0344 - otherwise
0345 No action.
0346
0347 3. If !list_empty(&eh_work_q), invoke scsi_eh_abort_cmds().
0348
0349 ``scsi_eh_abort_cmds``
0350
0351 This action is taken for each timed out command when
0352 no_async_abort is enabled in the host template.
0353 hostt->eh_abort_handler() is invoked for each scmd. The
0354 handler returns SUCCESS if it has succeeded to make LLDD and
0355 all related hardware forget about the scmd.
0356
0357 If a timedout scmd is successfully aborted and the sdev is
0358 either offline or ready, scsi_eh_finish_cmd() is invoked for
0359 the scmd. Otherwise, the scmd is left in eh_work_q for
0360 higher-severity actions.
0361
0362 Note that both offline and ready status mean that the sdev is
0363 ready to process new scmds, where processing also implies
0364 immediate failing; thus, if a sdev is in one of the two
0365 states, no further recovery action is needed.
0366
0367 Device readiness is tested using scsi_eh_tur() which issues
0368 TEST_UNIT_READY command. Note that the scmd must have been
0369 aborted successfully before reusing it for TEST_UNIT_READY.
0370
0371 4. If !list_empty(&eh_work_q), invoke scsi_eh_ready_devs()
0372
0373 ``scsi_eh_ready_devs``
0374
0375 This function takes four increasingly more severe measures to
0376 make failed sdevs ready for new commands.
0377
0378 1. Invoke scsi_eh_stu()
0379
0380 ``scsi_eh_stu``
0381
0382 For each sdev which has failed scmds with valid sense data
0383 of which scsi_check_sense()'s verdict is FAILED,
0384 START_STOP_UNIT command is issued w/ start=1. Note that
0385 as we explicitly choose error-completed scmds, it is known
0386 that lower layers have forgotten about the scmd and we can
0387 reuse it for STU.
0388
0389 If STU succeeds and the sdev is either offline or ready,
0390 all failed scmds on the sdev are EH-finished with
0391 scsi_eh_finish_cmd().
0392
0393 *NOTE* If hostt->eh_abort_handler() isn't implemented or
0394 failed, we may still have timed out scmds at this point
0395 and STU doesn't make lower layers forget about those
0396 scmds. Yet, this function EH-finish all scmds on the sdev
0397 if STU succeeds leaving lower layers in an inconsistent
0398 state. It seems that STU action should be taken only when
0399 a sdev has no timed out scmd.
0400
0401 2. If !list_empty(&eh_work_q), invoke scsi_eh_bus_device_reset().
0402
0403 ``scsi_eh_bus_device_reset``
0404
0405 This action is very similar to scsi_eh_stu() except that,
0406 instead of issuing STU, hostt->eh_device_reset_handler()
0407 is used. Also, as we're not issuing SCSI commands and
0408 resetting clears all scmds on the sdev, there is no need
0409 to choose error-completed scmds.
0410
0411 3. If !list_empty(&eh_work_q), invoke scsi_eh_bus_reset()
0412
0413 ``scsi_eh_bus_reset``
0414
0415 hostt->eh_bus_reset_handler() is invoked for each channel
0416 with failed scmds. If bus reset succeeds, all failed
0417 scmds on all ready or offline sdevs on the channel are
0418 EH-finished.
0419
0420 4. If !list_empty(&eh_work_q), invoke scsi_eh_host_reset()
0421
0422 ``scsi_eh_host_reset``
0423
0424 This is the last resort. hostt->eh_host_reset_handler()
0425 is invoked. If host reset succeeds, all failed scmds on
0426 all ready or offline sdevs on the host are EH-finished.
0427
0428 5. If !list_empty(&eh_work_q), invoke scsi_eh_offline_sdevs()
0429
0430 ``scsi_eh_offline_sdevs``
0431
0432 Take all sdevs which still have unrecovered scmds offline
0433 and EH-finish the scmds.
0434
0435 5. Invoke scsi_eh_flush_done_q().
0436
0437 ``scsi_eh_flush_done_q``
0438
0439 At this point all scmds are recovered (or given up) and
0440 put on eh_done_q by scsi_eh_finish_cmd(). This function
0441 flushes eh_done_q by either retrying or notifying upper
0442 layer of failure of the scmds.
0443
0444
0445 2.2 EH through transportt->eh_strategy_handler()
0446 ------------------------------------------------
0447
0448 transportt->eh_strategy_handler() is invoked in the place of
0449 scsi_unjam_host() and it is responsible for whole recovery process.
0450 On completion, the handler should have made lower layers forget about
0451 all failed scmds and either ready for new commands or offline. Also,
0452 it should perform SCSI EH maintenance chores to maintain integrity of
0453 SCSI midlayer. IOW, of the steps described in [2-1-2], all steps
0454 except for #1 must be implemented by eh_strategy_handler().
0455
0456
0457 2.2.1 Pre transportt->eh_strategy_handler() SCSI midlayer conditions
0458 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
0459
0460 The following conditions are true on entry to the handler.
0461
0462 - Each failed scmd's eh_flags field is set appropriately.
0463
0464 - Each failed scmd is linked on scmd->eh_cmd_q by scmd->eh_entry.
0465
0466 - SHOST_RECOVERY is set.
0467
0468 - shost->host_failed == shost->host_busy
0469
0470
0471 2.2.2 Post transportt->eh_strategy_handler() SCSI midlayer conditions
0472 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
0473
0474 The following conditions must be true on exit from the handler.
0475
0476 - shost->host_failed is zero.
0477
0478 - Each scmd is in such a state that scsi_setup_cmd_retry() on the
0479 scmd doesn't make any difference.
0480
0481 - shost->eh_cmd_q is cleared.
0482
0483 - Each scmd->eh_entry is cleared.
0484
0485 - Either scsi_queue_insert() or scsi_finish_command() is called on
0486 each scmd. Note that the handler is free to use scmd->retries and
0487 ->allowed to limit the number of retries.
0488
0489
0490 2.2.3 Things to consider
0491 ^^^^^^^^^^^^^^^^^^^^^^^^
0492
0493 - Know that timed out scmds are still active on lower layers. Make
0494 lower layers forget about them before doing anything else with
0495 those scmds.
0496
0497 - For consistency, when accessing/modifying shost data structure,
0498 grab shost->host_lock.
0499
0500 - On completion, each failed sdev must have forgotten about all
0501 active scmds.
0502
0503 - On completion, each failed sdev must be ready for new commands or
0504 offline.
0505
0506
0507 Tejun Heo
0508 htejun@gmail.com
0509
0510 11th September 2005