Back to home page

OSCL-LXR

 
 

    


0001 .. SPDX-License-Identifier: GPL-2.0
0002 
0003 .. include:: <isonum.txt>
0004 
0005 ===============================
0006 Bus lock detection and handling
0007 ===============================
0008 
0009 :Copyright: |copy| 2021 Intel Corporation
0010 :Authors: - Fenghua Yu <fenghua.yu@intel.com>
0011           - Tony Luck <tony.luck@intel.com>
0012 
0013 Problem
0014 =======
0015 
0016 A split lock is any atomic operation whose operand crosses two cache lines.
0017 Since the operand spans two cache lines and the operation must be atomic,
0018 the system locks the bus while the CPU accesses the two cache lines.
0019 
0020 A bus lock is acquired through either split locked access to writeback (WB)
0021 memory or any locked access to non-WB memory. This is typically thousands of
0022 cycles slower than an atomic operation within a cache line. It also disrupts
0023 performance on other cores and brings the whole system to its knees.
0024 
0025 Detection
0026 =========
0027 
0028 Intel processors may support either or both of the following hardware
0029 mechanisms to detect split locks and bus locks.
0030 
0031 #AC exception for split lock detection
0032 --------------------------------------
0033 
0034 Beginning with the Tremont Atom CPU split lock operations may raise an
0035 Alignment Check (#AC) exception when a split lock operation is attemped.
0036 
0037 #DB exception for bus lock detection
0038 ------------------------------------
0039 
0040 Some CPUs have the ability to notify the kernel by an #DB trap after a user
0041 instruction acquires a bus lock and is executed. This allows the kernel to
0042 terminate the application or to enforce throttling.
0043 
0044 Software handling
0045 =================
0046 
0047 The kernel #AC and #DB handlers handle bus lock based on the kernel
0048 parameter "split_lock_detect". Here is a summary of different options:
0049 
0050 +------------------+----------------------------+-----------------------+
0051 |split_lock_detect=|#AC for split lock          |#DB for bus lock       |
0052 +------------------+----------------------------+-----------------------+
0053 |off               |Do nothing                  |Do nothing             |
0054 +------------------+----------------------------+-----------------------+
0055 |warn              |Kernel OOPs                 |Warn once per task and |
0056 |(default)         |Warn once per task and      |and continues to run.  |
0057 |                  |disable future checking     |                       |
0058 |                  |When both features are      |                       |
0059 |                  |supported, warn in #AC      |                       |
0060 +------------------+----------------------------+-----------------------+
0061 |fatal             |Kernel OOPs                 |Send SIGBUS to user.   |
0062 |                  |Send SIGBUS to user         |                       |
0063 |                  |When both features are      |                       |
0064 |                  |supported, fatal in #AC     |                       |
0065 +------------------+----------------------------+-----------------------+
0066 |ratelimit:N       |Do nothing                  |Limit bus lock rate to |
0067 |(0 < N <= 1000)   |                            |N bus locks per second |
0068 |                  |                            |system wide and warn on|
0069 |                  |                            |bus locks.             |
0070 +------------------+----------------------------+-----------------------+
0071 
0072 Usages
0073 ======
0074 
0075 Detecting and handling bus lock may find usages in various areas:
0076 
0077 It is critical for real time system designers who build consolidated real
0078 time systems. These systems run hard real time code on some cores and run
0079 "untrusted" user processes on other cores. The hard real time cannot afford
0080 to have any bus lock from the untrusted processes to hurt real time
0081 performance. To date the designers have been unable to deploy these
0082 solutions as they have no way to prevent the "untrusted" user code from
0083 generating split lock and bus lock to block the hard real time code to
0084 access memory during bus locking.
0085 
0086 It's also useful for general computing to prevent guests or user
0087 applications from slowing down the overall system by executing instructions
0088 with bus lock.
0089 
0090 
0091 Guidance
0092 ========
0093 off
0094 ---
0095 
0096 Disable checking for split lock and bus lock. This option can be useful if
0097 there are legacy applications that trigger these events at a low rate so
0098 that mitigation is not needed.
0099 
0100 warn
0101 ----
0102 
0103 A warning is emitted when a bus lock is detected which allows to identify
0104 the offending application. This is the default behavior.
0105 
0106 fatal
0107 -----
0108 
0109 In this case, the bus lock is not tolerated and the process is killed.
0110 
0111 ratelimit
0112 ---------
0113 
0114 A system wide bus lock rate limit N is specified where 0 < N <= 1000. This
0115 allows a bus lock rate up to N bus locks per second. When the bus lock rate
0116 is exceeded then any task which is caught via the buslock #DB exception is
0117 throttled by enforced sleeps until the rate goes under the limit again.
0118 
0119 This is an effective mitigation in cases where a minimal impact can be
0120 tolerated, but an eventual Denial of Service attack has to be prevented. It
0121 allows to identify the offending processes and analyze whether they are
0122 malicious or just badly written.
0123 
0124 Selecting a rate limit of 1000 allows the bus to be locked for up to about
0125 seven million cycles each second (assuming 7000 cycles for each bus
0126 lock). On a 2 GHz processor that would be about 0.35% system slowdown.