Back to home page

OSCL-LXR

 
 

    


0001 .. SPDX-License-Identifier: GPL-2.0
0002 .. include:: <isonum.txt>
0003 
0004 ===================================
0005 Compute Express Link Memory Devices
0006 ===================================
0007 
0008 A Compute Express Link Memory Device is a CXL component that implements the
0009 CXL.mem protocol. It contains some amount of volatile memory, persistent memory,
0010 or both. It is enumerated as a PCI device for configuration and passing
0011 messages over an MMIO mailbox. Its contribution to the System Physical
0012 Address space is handled via HDM (Host Managed Device Memory) decoders
0013 that optionally define a device's contribution to an interleaved address
0014 range across multiple devices underneath a host-bridge or interleaved
0015 across host-bridges.
0016 
0017 CXL Bus: Theory of Operation
0018 ============================
0019 Similar to how a RAID driver takes disk objects and assembles them into a new
0020 logical device, the CXL subsystem is tasked to take PCIe and ACPI objects and
0021 assemble them into a CXL.mem decode topology. The need for runtime configuration
0022 of the CXL.mem topology is also similar to RAID in that different environments
0023 with the same hardware configuration may decide to assemble the topology in
0024 contrasting ways. One may choose performance (RAID0) striping memory across
0025 multiple Host Bridges and endpoints while another may opt for fault tolerance
0026 and disable any striping in the CXL.mem topology.
0027 
0028 Platform firmware enumerates a menu of interleave options at the "CXL root port"
0029 (Linux term for the top of the CXL decode topology). From there, PCIe topology
0030 dictates which endpoints can participate in which Host Bridge decode regimes.
0031 Each PCIe Switch in the path between the root and an endpoint introduces a point
0032 at which the interleave can be split. For example platform firmware may say at a
0033 given range only decodes to 1 one Host Bridge, but that Host Bridge may in turn
0034 interleave cycles across multiple Root Ports. An intervening Switch between a
0035 port and an endpoint may interleave cycles across multiple Downstream Switch
0036 Ports, etc.
0037 
0038 Here is a sample listing of a CXL topology defined by 'cxl_test'. The 'cxl_test'
0039 module generates an emulated CXL topology of 2 Host Bridges each with 2 Root
0040 Ports. Each of those Root Ports are connected to 2-way switches with endpoints
0041 connected to those downstream ports for a total of 8 endpoints::
0042 
0043     # cxl list -BEMPu -b cxl_test
0044     {
0045       "bus":"root3",
0046       "provider":"cxl_test",
0047       "ports:root3":[
0048         {
0049           "port":"port5",
0050           "host":"cxl_host_bridge.1",
0051           "ports:port5":[
0052             {
0053               "port":"port8",
0054               "host":"cxl_switch_uport.1",
0055               "endpoints:port8":[
0056                 {
0057                   "endpoint":"endpoint9",
0058                   "host":"mem2",
0059                   "memdev":{
0060                     "memdev":"mem2",
0061                     "pmem_size":"256.00 MiB (268.44 MB)",
0062                     "ram_size":"256.00 MiB (268.44 MB)",
0063                     "serial":"0x1",
0064                     "numa_node":1,
0065                     "host":"cxl_mem.1"
0066                   }
0067                 },
0068                 {
0069                   "endpoint":"endpoint15",
0070                   "host":"mem6",
0071                   "memdev":{
0072                     "memdev":"mem6",
0073                     "pmem_size":"256.00 MiB (268.44 MB)",
0074                     "ram_size":"256.00 MiB (268.44 MB)",
0075                     "serial":"0x5",
0076                     "numa_node":1,
0077                     "host":"cxl_mem.5"
0078                   }
0079                 }
0080               ]
0081             },
0082             {
0083               "port":"port12",
0084               "host":"cxl_switch_uport.3",
0085               "endpoints:port12":[
0086                 {
0087                   "endpoint":"endpoint17",
0088                   "host":"mem8",
0089                   "memdev":{
0090                     "memdev":"mem8",
0091                     "pmem_size":"256.00 MiB (268.44 MB)",
0092                     "ram_size":"256.00 MiB (268.44 MB)",
0093                     "serial":"0x7",
0094                     "numa_node":1,
0095                     "host":"cxl_mem.7"
0096                   }
0097                 },
0098                 {
0099                   "endpoint":"endpoint13",
0100                   "host":"mem4",
0101                   "memdev":{
0102                     "memdev":"mem4",
0103                     "pmem_size":"256.00 MiB (268.44 MB)",
0104                     "ram_size":"256.00 MiB (268.44 MB)",
0105                     "serial":"0x3",
0106                     "numa_node":1,
0107                     "host":"cxl_mem.3"
0108                   }
0109                 }
0110               ]
0111             }
0112           ]
0113         },
0114         {
0115           "port":"port4",
0116           "host":"cxl_host_bridge.0",
0117           "ports:port4":[
0118             {
0119               "port":"port6",
0120               "host":"cxl_switch_uport.0",
0121               "endpoints:port6":[
0122                 {
0123                   "endpoint":"endpoint7",
0124                   "host":"mem1",
0125                   "memdev":{
0126                     "memdev":"mem1",
0127                     "pmem_size":"256.00 MiB (268.44 MB)",
0128                     "ram_size":"256.00 MiB (268.44 MB)",
0129                     "serial":"0",
0130                     "numa_node":0,
0131                     "host":"cxl_mem.0"
0132                   }
0133                 },
0134                 {
0135                   "endpoint":"endpoint14",
0136                   "host":"mem5",
0137                   "memdev":{
0138                     "memdev":"mem5",
0139                     "pmem_size":"256.00 MiB (268.44 MB)",
0140                     "ram_size":"256.00 MiB (268.44 MB)",
0141                     "serial":"0x4",
0142                     "numa_node":0,
0143                     "host":"cxl_mem.4"
0144                   }
0145                 }
0146               ]
0147             },
0148             {
0149               "port":"port10",
0150               "host":"cxl_switch_uport.2",
0151               "endpoints:port10":[
0152                 {
0153                   "endpoint":"endpoint16",
0154                   "host":"mem7",
0155                   "memdev":{
0156                     "memdev":"mem7",
0157                     "pmem_size":"256.00 MiB (268.44 MB)",
0158                     "ram_size":"256.00 MiB (268.44 MB)",
0159                     "serial":"0x6",
0160                     "numa_node":0,
0161                     "host":"cxl_mem.6"
0162                   }
0163                 },
0164                 {
0165                   "endpoint":"endpoint11",
0166                   "host":"mem3",
0167                   "memdev":{
0168                     "memdev":"mem3",
0169                     "pmem_size":"256.00 MiB (268.44 MB)",
0170                     "ram_size":"256.00 MiB (268.44 MB)",
0171                     "serial":"0x2",
0172                     "numa_node":0,
0173                     "host":"cxl_mem.2"
0174                   }
0175                 }
0176               ]
0177             }
0178           ]
0179         }
0180       ]
0181     }
0182 
0183 In that listing each "root", "port", and "endpoint" object correspond a kernel
0184 'struct cxl_port' object. A 'cxl_port' is a device that can decode CXL.mem to
0185 its descendants. So "root" claims non-PCIe enumerable platform decode ranges and
0186 decodes them to "ports", "ports" decode to "endpoints", and "endpoints"
0187 represent the decode from SPA (System Physical Address) to DPA (Device Physical
0188 Address).
0189 
0190 Continuing the RAID analogy, disks have both topology metadata and on device
0191 metadata that determine RAID set assembly. CXL Port topology and CXL Port link
0192 status is metadata for CXL.mem set assembly. The CXL Port topology is enumerated
0193 by the arrival of a CXL.mem device. I.e. unless and until the PCIe core attaches
0194 the cxl_pci driver to a CXL Memory Expander there is no role for CXL Port
0195 objects. Conversely for hot-unplug / removal scenarios, there is no need for
0196 the Linux PCI core to tear down switch-level CXL resources because the endpoint
0197 ->remove() event cleans up the port data that was established to support that
0198 Memory Expander.
0199 
0200 The port metadata and potential decode schemes that a give memory device may
0201 participate can be determined via a command like::
0202 
0203     # cxl list -BDMu -d root -m mem3
0204     {
0205       "bus":"root3",
0206       "provider":"cxl_test",
0207       "decoders:root3":[
0208         {
0209           "decoder":"decoder3.1",
0210           "resource":"0x8030000000",
0211           "size":"512.00 MiB (536.87 MB)",
0212           "volatile_capable":true,
0213           "nr_targets":2
0214         },
0215         {
0216           "decoder":"decoder3.3",
0217           "resource":"0x8060000000",
0218           "size":"512.00 MiB (536.87 MB)",
0219           "pmem_capable":true,
0220           "nr_targets":2
0221         },
0222         {
0223           "decoder":"decoder3.0",
0224           "resource":"0x8020000000",
0225           "size":"256.00 MiB (268.44 MB)",
0226           "volatile_capable":true,
0227           "nr_targets":1
0228         },
0229         {
0230           "decoder":"decoder3.2",
0231           "resource":"0x8050000000",
0232           "size":"256.00 MiB (268.44 MB)",
0233           "pmem_capable":true,
0234           "nr_targets":1
0235         }
0236       ],
0237       "memdevs:root3":[
0238         {
0239           "memdev":"mem3",
0240           "pmem_size":"256.00 MiB (268.44 MB)",
0241           "ram_size":"256.00 MiB (268.44 MB)",
0242           "serial":"0x2",
0243           "numa_node":0,
0244           "host":"cxl_mem.2"
0245         }
0246       ]
0247     }
0248 
0249 ...which queries the CXL topology to ask "given CXL Memory Expander with a kernel
0250 device name of 'mem3' which platform level decode ranges may this device
0251 participate". A given expander can participate in multiple CXL.mem interleave
0252 sets simultaneously depending on how many decoder resource it has. In this
0253 example mem3 can participate in one or more of a PMEM interleave that spans to
0254 Host Bridges, a PMEM interleave that targets a single Host Bridge, a Volatile
0255 memory interleave that spans 2 Host Bridges, and a Volatile memory interleave
0256 that only targets a single Host Bridge.
0257 
0258 Conversely the memory devices that can participate in a given platform level
0259 decode scheme can be determined via a command like the following::
0260 
0261     # cxl list -MDu -d 3.2
0262     [
0263       {
0264         "memdevs":[
0265           {
0266             "memdev":"mem1",
0267             "pmem_size":"256.00 MiB (268.44 MB)",
0268             "ram_size":"256.00 MiB (268.44 MB)",
0269             "serial":"0",
0270             "numa_node":0,
0271             "host":"cxl_mem.0"
0272           },
0273           {
0274             "memdev":"mem5",
0275             "pmem_size":"256.00 MiB (268.44 MB)",
0276             "ram_size":"256.00 MiB (268.44 MB)",
0277             "serial":"0x4",
0278             "numa_node":0,
0279             "host":"cxl_mem.4"
0280           },
0281           {
0282             "memdev":"mem7",
0283             "pmem_size":"256.00 MiB (268.44 MB)",
0284             "ram_size":"256.00 MiB (268.44 MB)",
0285             "serial":"0x6",
0286             "numa_node":0,
0287             "host":"cxl_mem.6"
0288           },
0289           {
0290             "memdev":"mem3",
0291             "pmem_size":"256.00 MiB (268.44 MB)",
0292             "ram_size":"256.00 MiB (268.44 MB)",
0293             "serial":"0x2",
0294             "numa_node":0,
0295             "host":"cxl_mem.2"
0296           }
0297         ]
0298       },
0299       {
0300         "root decoders":[
0301           {
0302             "decoder":"decoder3.2",
0303             "resource":"0x8050000000",
0304             "size":"256.00 MiB (268.44 MB)",
0305             "pmem_capable":true,
0306             "nr_targets":1
0307           }
0308         ]
0309       }
0310     ]
0311 
0312 ...where the naming scheme for decoders is "decoder<port_id>.<instance_id>".
0313 
0314 Driver Infrastructure
0315 =====================
0316 
0317 This section covers the driver infrastructure for a CXL memory device.
0318 
0319 CXL Memory Device
0320 -----------------
0321 
0322 .. kernel-doc:: drivers/cxl/pci.c
0323    :doc: cxl pci
0324 
0325 .. kernel-doc:: drivers/cxl/pci.c
0326    :internal:
0327 
0328 .. kernel-doc:: drivers/cxl/mem.c
0329    :doc: cxl mem
0330 
0331 CXL Port
0332 --------
0333 .. kernel-doc:: drivers/cxl/port.c
0334    :doc: cxl port
0335 
0336 CXL Core
0337 --------
0338 .. kernel-doc:: drivers/cxl/cxl.h
0339    :doc: cxl objects
0340 
0341 .. kernel-doc:: drivers/cxl/cxl.h
0342    :internal:
0343 
0344 .. kernel-doc:: drivers/cxl/core/port.c
0345    :doc: cxl core
0346 
0347 .. kernel-doc:: drivers/cxl/core/port.c
0348    :identifiers:
0349 
0350 .. kernel-doc:: drivers/cxl/core/pci.c
0351    :doc: cxl core pci
0352 
0353 .. kernel-doc:: drivers/cxl/core/pci.c
0354    :identifiers:
0355 
0356 .. kernel-doc:: drivers/cxl/core/pmem.c
0357    :doc: cxl pmem
0358 
0359 .. kernel-doc:: drivers/cxl/core/regs.c
0360    :doc: cxl registers
0361 
0362 .. kernel-doc:: drivers/cxl/core/mbox.c
0363    :doc: cxl mbox
0364 
0365 CXL Regions
0366 -----------
0367 .. kernel-doc:: drivers/cxl/core/region.c
0368    :doc: cxl core region
0369 
0370 .. kernel-doc:: drivers/cxl/core/region.c
0371    :identifiers:
0372 
0373 External Interfaces
0374 ===================
0375 
0376 CXL IOCTL Interface
0377 -------------------
0378 
0379 .. kernel-doc:: include/uapi/linux/cxl_mem.h
0380    :doc: UAPI
0381 
0382 .. kernel-doc:: include/uapi/linux/cxl_mem.h
0383    :internal: