Back to home page

OSCL-LXR

 
 

    


0001 Microarchitectural Data Sampling (MDS) mitigation
0002 =================================================
0003 
0004 .. _mds:
0005 
0006 Overview
0007 --------
0008 
0009 Microarchitectural Data Sampling (MDS) is a family of side channel attacks
0010 on internal buffers in Intel CPUs. The variants are:
0011 
0012  - Microarchitectural Store Buffer Data Sampling (MSBDS) (CVE-2018-12126)
0013  - Microarchitectural Fill Buffer Data Sampling (MFBDS) (CVE-2018-12130)
0014  - Microarchitectural Load Port Data Sampling (MLPDS) (CVE-2018-12127)
0015  - Microarchitectural Data Sampling Uncacheable Memory (MDSUM) (CVE-2019-11091)
0016 
0017 MSBDS leaks Store Buffer Entries which can be speculatively forwarded to a
0018 dependent load (store-to-load forwarding) as an optimization. The forward
0019 can also happen to a faulting or assisting load operation for a different
0020 memory address, which can be exploited under certain conditions. Store
0021 buffers are partitioned between Hyper-Threads so cross thread forwarding is
0022 not possible. But if a thread enters or exits a sleep state the store
0023 buffer is repartitioned which can expose data from one thread to the other.
0024 
0025 MFBDS leaks Fill Buffer Entries. Fill buffers are used internally to manage
0026 L1 miss situations and to hold data which is returned or sent in response
0027 to a memory or I/O operation. Fill buffers can forward data to a load
0028 operation and also write data to the cache. When the fill buffer is
0029 deallocated it can retain the stale data of the preceding operations which
0030 can then be forwarded to a faulting or assisting load operation, which can
0031 be exploited under certain conditions. Fill buffers are shared between
0032 Hyper-Threads so cross thread leakage is possible.
0033 
0034 MLPDS leaks Load Port Data. Load ports are used to perform load operations
0035 from memory or I/O. The received data is then forwarded to the register
0036 file or a subsequent operation. In some implementations the Load Port can
0037 contain stale data from a previous operation which can be forwarded to
0038 faulting or assisting loads under certain conditions, which again can be
0039 exploited eventually. Load ports are shared between Hyper-Threads so cross
0040 thread leakage is possible.
0041 
0042 MDSUM is a special case of MSBDS, MFBDS and MLPDS. An uncacheable load from
0043 memory that takes a fault or assist can leave data in a microarchitectural
0044 structure that may later be observed using one of the same methods used by
0045 MSBDS, MFBDS or MLPDS.
0046 
0047 Exposure assumptions
0048 --------------------
0049 
0050 It is assumed that attack code resides in user space or in a guest with one
0051 exception. The rationale behind this assumption is that the code construct
0052 needed for exploiting MDS requires:
0053 
0054  - to control the load to trigger a fault or assist
0055 
0056  - to have a disclosure gadget which exposes the speculatively accessed
0057    data for consumption through a side channel.
0058 
0059  - to control the pointer through which the disclosure gadget exposes the
0060    data
0061 
0062 The existence of such a construct in the kernel cannot be excluded with
0063 100% certainty, but the complexity involved makes it extremly unlikely.
0064 
0065 There is one exception, which is untrusted BPF. The functionality of
0066 untrusted BPF is limited, but it needs to be thoroughly investigated
0067 whether it can be used to create such a construct.
0068 
0069 
0070 Mitigation strategy
0071 -------------------
0072 
0073 All variants have the same mitigation strategy at least for the single CPU
0074 thread case (SMT off): Force the CPU to clear the affected buffers.
0075 
0076 This is achieved by using the otherwise unused and obsolete VERW
0077 instruction in combination with a microcode update. The microcode clears
0078 the affected CPU buffers when the VERW instruction is executed.
0079 
0080 For virtualization there are two ways to achieve CPU buffer
0081 clearing. Either the modified VERW instruction or via the L1D Flush
0082 command. The latter is issued when L1TF mitigation is enabled so the extra
0083 VERW can be avoided. If the CPU is not affected by L1TF then VERW needs to
0084 be issued.
0085 
0086 If the VERW instruction with the supplied segment selector argument is
0087 executed on a CPU without the microcode update there is no side effect
0088 other than a small number of pointlessly wasted CPU cycles.
0089 
0090 This does not protect against cross Hyper-Thread attacks except for MSBDS
0091 which is only exploitable cross Hyper-thread when one of the Hyper-Threads
0092 enters a C-state.
0093 
0094 The kernel provides a function to invoke the buffer clearing:
0095 
0096     mds_clear_cpu_buffers()
0097 
0098 The mitigation is invoked on kernel/userspace, hypervisor/guest and C-state
0099 (idle) transitions.
0100 
0101 As a special quirk to address virtualization scenarios where the host has
0102 the microcode updated, but the hypervisor does not (yet) expose the
0103 MD_CLEAR CPUID bit to guests, the kernel issues the VERW instruction in the
0104 hope that it might actually clear the buffers. The state is reflected
0105 accordingly.
0106 
0107 According to current knowledge additional mitigations inside the kernel
0108 itself are not required because the necessary gadgets to expose the leaked
0109 data cannot be controlled in a way which allows exploitation from malicious
0110 user space or VM guests.
0111 
0112 Kernel internal mitigation modes
0113 --------------------------------
0114 
0115  ======= ============================================================
0116  off      Mitigation is disabled. Either the CPU is not affected or
0117           mds=off is supplied on the kernel command line
0118 
0119  full     Mitigation is enabled. CPU is affected and MD_CLEAR is
0120           advertised in CPUID.
0121 
0122  vmwerv   Mitigation is enabled. CPU is affected and MD_CLEAR is not
0123           advertised in CPUID. That is mainly for virtualization
0124           scenarios where the host has the updated microcode but the
0125           hypervisor does not expose MD_CLEAR in CPUID. It's a best
0126           effort approach without guarantee.
0127  ======= ============================================================
0128 
0129 If the CPU is affected and mds=off is not supplied on the kernel command
0130 line then the kernel selects the appropriate mitigation mode depending on
0131 the availability of the MD_CLEAR CPUID bit.
0132 
0133 Mitigation points
0134 -----------------
0135 
0136 1. Return to user space
0137 ^^^^^^^^^^^^^^^^^^^^^^^
0138 
0139    When transitioning from kernel to user space the CPU buffers are flushed
0140    on affected CPUs when the mitigation is not disabled on the kernel
0141    command line. The migitation is enabled through the static key
0142    mds_user_clear.
0143 
0144    The mitigation is invoked in prepare_exit_to_usermode() which covers
0145    all but one of the kernel to user space transitions.  The exception
0146    is when we return from a Non Maskable Interrupt (NMI), which is
0147    handled directly in do_nmi().
0148 
0149    (The reason that NMI is special is that prepare_exit_to_usermode() can
0150     enable IRQs.  In NMI context, NMIs are blocked, and we don't want to
0151     enable IRQs with NMIs blocked.)
0152 
0153 
0154 2. C-State transition
0155 ^^^^^^^^^^^^^^^^^^^^^
0156 
0157    When a CPU goes idle and enters a C-State the CPU buffers need to be
0158    cleared on affected CPUs when SMT is active. This addresses the
0159    repartitioning of the store buffer when one of the Hyper-Threads enters
0160    a C-State.
0161 
0162    When SMT is inactive, i.e. either the CPU does not support it or all
0163    sibling threads are offline CPU buffer clearing is not required.
0164 
0165    The idle clearing is enabled on CPUs which are only affected by MSBDS
0166    and not by any other MDS variant. The other MDS variants cannot be
0167    protected against cross Hyper-Thread attacks because the Fill Buffer and
0168    the Load Ports are shared. So on CPUs affected by other variants, the
0169    idle clearing would be a window dressing exercise and is therefore not
0170    activated.
0171 
0172    The invocation is controlled by the static key mds_idle_clear which is
0173    switched depending on the chosen mitigation mode and the SMT state of
0174    the system.
0175 
0176    The buffer clear is only invoked before entering the C-State to prevent
0177    that stale data from the idling CPU from spilling to the Hyper-Thread
0178    sibling after the store buffer got repartitioned and all entries are
0179    available to the non idle sibling.
0180 
0181    When coming out of idle the store buffer is partitioned again so each
0182    sibling has half of it available. The back from idle CPU could be then
0183    speculatively exposed to contents of the sibling. The buffers are
0184    flushed either on exit to user space or on VMENTER so malicious code
0185    in user space or the guest cannot speculatively access them.
0186 
0187    The mitigation is hooked into all variants of halt()/mwait(), but does
0188    not cover the legacy ACPI IO-Port mechanism because the ACPI idle driver
0189    has been superseded by the intel_idle driver around 2010 and is
0190    preferred on all affected CPUs which are expected to gain the MD_CLEAR
0191    functionality in microcode. Aside of that the IO-Port mechanism is a
0192    legacy interface which is only used on older systems which are either
0193    not affected or do not receive microcode updates anymore.