0001 What: /sys/devices/system/machinecheck/machinecheckX/tolerant
0002 Contact: Borislav Petkov <bp@suse.de>
0003 Date: Dec, 2021
0004 Description:
0005 Unused and obsolete after the advent of recoverable machine
0006 checks (see last sentence below) and those are present since
0007 2010 (Nehalem).
0008
0009 Original description:
0010
0011 The entries appear for each CPU, but they are truly shared
0012 between all CPUs.
0013
0014 Tolerance level. When a machine check exception occurs for a
0015 non corrected machine check the kernel can take different
0016 actions.
0017
0018 Since machine check exceptions can happen any time it is
0019 sometimes risky for the kernel to kill a process because it
0020 defies normal kernel locking rules. The tolerance level
0021 configures how hard the kernel tries to recover even at some
0022 risk of deadlock. Higher tolerant values trade potentially
0023 better uptime with the risk of a crash or even corruption
0024 (for tolerant >= 3).
0025
0026 == ===========================================================
0027 0 always panic on uncorrected errors, log corrected errors
0028 1 panic or SIGBUS on uncorrected errors, log corrected errors
0029 2 SIGBUS or log uncorrected errors, log corrected errors
0030 3 never panic or SIGBUS, log all errors (for testing only)
0031 == ===========================================================
0032
0033 Default: 1
0034
0035 Note this only makes a difference if the CPU allows recovery
0036 from a machine check exception. Current x86 CPUs generally
0037 do not.