Back to home page

OSCL-LXR

 
 

    


0001 ====================================
0002 Coherent Accelerator Interface (CXL)
0003 ====================================
0004 
0005 Introduction
0006 ============
0007 
0008     The coherent accelerator interface is designed to allow the
0009     coherent connection of accelerators (FPGAs and other devices) to a
0010     POWER system. These devices need to adhere to the Coherent
0011     Accelerator Interface Architecture (CAIA).
0012 
0013     IBM refers to this as the Coherent Accelerator Processor Interface
0014     or CAPI. In the kernel it's referred to by the name CXL to avoid
0015     confusion with the ISDN CAPI subsystem.
0016 
0017     Coherent in this context means that the accelerator and CPUs can
0018     both access system memory directly and with the same effective
0019     addresses.
0020 
0021 
0022 Hardware overview
0023 =================
0024 
0025     ::
0026 
0027          POWER8/9             FPGA
0028        +----------+        +---------+
0029        |          |        |         |
0030        |   CPU    |        |   AFU   |
0031        |          |        |         |
0032        |          |        |         |
0033        |          |        |         |
0034        +----------+        +---------+
0035        |   PHB    |        |         |
0036        |   +------+        |   PSL   |
0037        |   | CAPP |<------>|         |
0038        +---+------+  PCIE  +---------+
0039 
0040     The POWER8/9 chip has a Coherently Attached Processor Proxy (CAPP)
0041     unit which is part of the PCIe Host Bridge (PHB). This is managed
0042     by Linux by calls into OPAL. Linux doesn't directly program the
0043     CAPP.
0044 
0045     The FPGA (or coherently attached device) consists of two parts.
0046     The POWER Service Layer (PSL) and the Accelerator Function Unit
0047     (AFU). The AFU is used to implement specific functionality behind
0048     the PSL. The PSL, among other things, provides memory address
0049     translation services to allow each AFU direct access to userspace
0050     memory.
0051 
0052     The AFU is the core part of the accelerator (eg. the compression,
0053     crypto etc function). The kernel has no knowledge of the function
0054     of the AFU. Only userspace interacts directly with the AFU.
0055 
0056     The PSL provides the translation and interrupt services that the
0057     AFU needs. This is what the kernel interacts with. For example, if
0058     the AFU needs to read a particular effective address, it sends
0059     that address to the PSL, the PSL then translates it, fetches the
0060     data from memory and returns it to the AFU. If the PSL has a
0061     translation miss, it interrupts the kernel and the kernel services
0062     the fault. The context to which this fault is serviced is based on
0063     who owns that acceleration function.
0064 
0065     - POWER8 and PSL Version 8 are compliant to the CAIA Version 1.0.
0066     - POWER9 and PSL Version 9 are compliant to the CAIA Version 2.0.
0067 
0068     This PSL Version 9 provides new features such as:
0069 
0070     * Interaction with the nest MMU on the P9 chip.
0071     * Native DMA support.
0072     * Supports sending ASB_Notify messages for host thread wakeup.
0073     * Supports Atomic operations.
0074     * etc.
0075 
0076     Cards with a PSL9 won't work on a POWER8 system and cards with a
0077     PSL8 won't work on a POWER9 system.
0078 
0079 AFU Modes
0080 =========
0081 
0082     There are two programming modes supported by the AFU. Dedicated
0083     and AFU directed. AFU may support one or both modes.
0084 
0085     When using dedicated mode only one MMU context is supported. In
0086     this mode, only one userspace process can use the accelerator at
0087     time.
0088 
0089     When using AFU directed mode, up to 16K simultaneous contexts can
0090     be supported. This means up to 16K simultaneous userspace
0091     applications may use the accelerator (although specific AFUs may
0092     support fewer). In this mode, the AFU sends a 16 bit context ID
0093     with each of its requests. This tells the PSL which context is
0094     associated with each operation. If the PSL can't translate an
0095     operation, the ID can also be accessed by the kernel so it can
0096     determine the userspace context associated with an operation.
0097 
0098 
0099 MMIO space
0100 ==========
0101 
0102     A portion of the accelerator MMIO space can be directly mapped
0103     from the AFU to userspace. Either the whole space can be mapped or
0104     just a per context portion. The hardware is self describing, hence
0105     the kernel can determine the offset and size of the per context
0106     portion.
0107 
0108 
0109 Interrupts
0110 ==========
0111 
0112     AFUs may generate interrupts that are destined for userspace. These
0113     are received by the kernel as hardware interrupts and passed onto
0114     userspace by a read syscall documented below.
0115 
0116     Data storage faults and error interrupts are handled by the kernel
0117     driver.
0118 
0119 
0120 Work Element Descriptor (WED)
0121 =============================
0122 
0123     The WED is a 64-bit parameter passed to the AFU when a context is
0124     started. Its format is up to the AFU hence the kernel has no
0125     knowledge of what it represents. Typically it will be the
0126     effective address of a work queue or status block where the AFU
0127     and userspace can share control and status information.
0128 
0129 
0130 
0131 
0132 User API
0133 ========
0134 
0135 1. AFU character devices
0136 ^^^^^^^^^^^^^^^^^^^^^^^^
0137 
0138     For AFUs operating in AFU directed mode, two character device
0139     files will be created. /dev/cxl/afu0.0m will correspond to a
0140     master context and /dev/cxl/afu0.0s will correspond to a slave
0141     context. Master contexts have access to the full MMIO space an
0142     AFU provides. Slave contexts have access to only the per process
0143     MMIO space an AFU provides.
0144 
0145     For AFUs operating in dedicated process mode, the driver will
0146     only create a single character device per AFU called
0147     /dev/cxl/afu0.0d. This will have access to the entire MMIO space
0148     that the AFU provides (like master contexts in AFU directed).
0149 
0150     The types described below are defined in include/uapi/misc/cxl.h
0151 
0152     The following file operations are supported on both slave and
0153     master devices.
0154 
0155     A userspace library libcxl is available here:
0156 
0157         https://github.com/ibm-capi/libcxl
0158 
0159     This provides a C interface to this kernel API.
0160 
0161 open
0162 ----
0163 
0164     Opens the device and allocates a file descriptor to be used with
0165     the rest of the API.
0166 
0167     A dedicated mode AFU only has one context and only allows the
0168     device to be opened once.
0169 
0170     An AFU directed mode AFU can have many contexts, the device can be
0171     opened once for each context that is available.
0172 
0173     When all available contexts are allocated the open call will fail
0174     and return -ENOSPC.
0175 
0176     Note:
0177           IRQs need to be allocated for each context, which may limit
0178           the number of contexts that can be created, and therefore
0179           how many times the device can be opened. The POWER8 CAPP
0180           supports 2040 IRQs and 3 are used by the kernel, so 2037 are
0181           left. If 1 IRQ is needed per context, then only 2037
0182           contexts can be allocated. If 4 IRQs are needed per context,
0183           then only 2037/4 = 509 contexts can be allocated.
0184 
0185 
0186 ioctl
0187 -----
0188 
0189     CXL_IOCTL_START_WORK:
0190         Starts the AFU context and associates it with the current
0191         process. Once this ioctl is successfully executed, all memory
0192         mapped into this process is accessible to this AFU context
0193         using the same effective addresses. No additional calls are
0194         required to map/unmap memory. The AFU memory context will be
0195         updated as userspace allocates and frees memory. This ioctl
0196         returns once the AFU context is started.
0197 
0198         Takes a pointer to a struct cxl_ioctl_start_work
0199 
0200             ::
0201 
0202                 struct cxl_ioctl_start_work {
0203                         __u64 flags;
0204                         __u64 work_element_descriptor;
0205                         __u64 amr;
0206                         __s16 num_interrupts;
0207                         __s16 reserved1;
0208                         __s32 reserved2;
0209                         __u64 reserved3;
0210                         __u64 reserved4;
0211                         __u64 reserved5;
0212                         __u64 reserved6;
0213                 };
0214 
0215             flags:
0216                 Indicates which optional fields in the structure are
0217                 valid.
0218 
0219             work_element_descriptor:
0220                 The Work Element Descriptor (WED) is a 64-bit argument
0221                 defined by the AFU. Typically this is an effective
0222                 address pointing to an AFU specific structure
0223                 describing what work to perform.
0224 
0225             amr:
0226                 Authority Mask Register (AMR), same as the powerpc
0227                 AMR. This field is only used by the kernel when the
0228                 corresponding CXL_START_WORK_AMR value is specified in
0229                 flags. If not specified the kernel will use a default
0230                 value of 0.
0231 
0232             num_interrupts:
0233                 Number of userspace interrupts to request. This field
0234                 is only used by the kernel when the corresponding
0235                 CXL_START_WORK_NUM_IRQS value is specified in flags.
0236                 If not specified the minimum number required by the
0237                 AFU will be allocated. The min and max number can be
0238                 obtained from sysfs.
0239 
0240             reserved fields:
0241                 For ABI padding and future extensions
0242 
0243     CXL_IOCTL_GET_PROCESS_ELEMENT:
0244         Get the current context id, also known as the process element.
0245         The value is returned from the kernel as a __u32.
0246 
0247 
0248 mmap
0249 ----
0250 
0251     An AFU may have an MMIO space to facilitate communication with the
0252     AFU. If it does, the MMIO space can be accessed via mmap. The size
0253     and contents of this area are specific to the particular AFU. The
0254     size can be discovered via sysfs.
0255 
0256     In AFU directed mode, master contexts are allowed to map all of
0257     the MMIO space and slave contexts are allowed to only map the per
0258     process MMIO space associated with the context. In dedicated
0259     process mode the entire MMIO space can always be mapped.
0260 
0261     This mmap call must be done after the START_WORK ioctl.
0262 
0263     Care should be taken when accessing MMIO space. Only 32 and 64-bit
0264     accesses are supported by POWER8. Also, the AFU will be designed
0265     with a specific endianness, so all MMIO accesses should consider
0266     endianness (recommend endian(3) variants like: le64toh(),
0267     be64toh() etc). These endian issues equally apply to shared memory
0268     queues the WED may describe.
0269 
0270 
0271 read
0272 ----
0273 
0274     Reads events from the AFU. Blocks if no events are pending
0275     (unless O_NONBLOCK is supplied). Returns -EIO in the case of an
0276     unrecoverable error or if the card is removed.
0277 
0278     read() will always return an integral number of events.
0279 
0280     The buffer passed to read() must be at least 4K bytes.
0281 
0282     The result of the read will be a buffer of one or more events,
0283     each event is of type struct cxl_event, of varying size::
0284 
0285             struct cxl_event {
0286                     struct cxl_event_header header;
0287                     union {
0288                             struct cxl_event_afu_interrupt irq;
0289                             struct cxl_event_data_storage fault;
0290                             struct cxl_event_afu_error afu_error;
0291                     };
0292             };
0293 
0294     The struct cxl_event_header is defined as
0295 
0296         ::
0297 
0298             struct cxl_event_header {
0299                     __u16 type;
0300                     __u16 size;
0301                     __u16 process_element;
0302                     __u16 reserved1;
0303             };
0304 
0305         type:
0306             This defines the type of event. The type determines how
0307             the rest of the event is structured. These types are
0308             described below and defined by enum cxl_event_type.
0309 
0310         size:
0311             This is the size of the event in bytes including the
0312             struct cxl_event_header. The start of the next event can
0313             be found at this offset from the start of the current
0314             event.
0315 
0316         process_element:
0317             Context ID of the event.
0318 
0319         reserved field:
0320             For future extensions and padding.
0321 
0322     If the event type is CXL_EVENT_AFU_INTERRUPT then the event
0323     structure is defined as
0324 
0325         ::
0326 
0327             struct cxl_event_afu_interrupt {
0328                     __u16 flags;
0329                     __u16 irq; /* Raised AFU interrupt number */
0330                     __u32 reserved1;
0331             };
0332 
0333         flags:
0334             These flags indicate which optional fields are present
0335             in this struct. Currently all fields are mandatory.
0336 
0337         irq:
0338             The IRQ number sent by the AFU.
0339 
0340         reserved field:
0341             For future extensions and padding.
0342 
0343     If the event type is CXL_EVENT_DATA_STORAGE then the event
0344     structure is defined as
0345 
0346         ::
0347 
0348             struct cxl_event_data_storage {
0349                     __u16 flags;
0350                     __u16 reserved1;
0351                     __u32 reserved2;
0352                     __u64 addr;
0353                     __u64 dsisr;
0354                     __u64 reserved3;
0355             };
0356 
0357         flags:
0358             These flags indicate which optional fields are present in
0359             this struct. Currently all fields are mandatory.
0360 
0361         address:
0362             The address that the AFU unsuccessfully attempted to
0363             access. Valid accesses will be handled transparently by the
0364             kernel but invalid accesses will generate this event.
0365 
0366         dsisr:
0367             This field gives information on the type of fault. It is a
0368             copy of the DSISR from the PSL hardware when the address
0369             fault occurred. The form of the DSISR is as defined in the
0370             CAIA.
0371 
0372         reserved fields:
0373             For future extensions
0374 
0375     If the event type is CXL_EVENT_AFU_ERROR then the event structure
0376     is defined as
0377 
0378         ::
0379 
0380             struct cxl_event_afu_error {
0381                     __u16 flags;
0382                     __u16 reserved1;
0383                     __u32 reserved2;
0384                     __u64 error;
0385             };
0386 
0387         flags:
0388             These flags indicate which optional fields are present in
0389             this struct. Currently all fields are Mandatory.
0390 
0391         error:
0392             Error status from the AFU. Defined by the AFU.
0393 
0394         reserved fields:
0395             For future extensions and padding
0396 
0397 
0398 2. Card character device (powerVM guest only)
0399 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
0400 
0401     In a powerVM guest, an extra character device is created for the
0402     card. The device is only used to write (flash) a new image on the
0403     FPGA accelerator. Once the image is written and verified, the
0404     device tree is updated and the card is reset to reload the updated
0405     image.
0406 
0407 open
0408 ----
0409 
0410     Opens the device and allocates a file descriptor to be used with
0411     the rest of the API. The device can only be opened once.
0412 
0413 ioctl
0414 -----
0415 
0416 CXL_IOCTL_DOWNLOAD_IMAGE / CXL_IOCTL_VALIDATE_IMAGE:
0417     Starts and controls flashing a new FPGA image. Partial
0418     reconfiguration is not supported (yet), so the image must contain
0419     a copy of the PSL and AFU(s). Since an image can be quite large,
0420     the caller may have to iterate, splitting the image in smaller
0421     chunks.
0422 
0423     Takes a pointer to a struct cxl_adapter_image::
0424 
0425         struct cxl_adapter_image {
0426             __u64 flags;
0427             __u64 data;
0428             __u64 len_data;
0429             __u64 len_image;
0430             __u64 reserved1;
0431             __u64 reserved2;
0432             __u64 reserved3;
0433             __u64 reserved4;
0434         };
0435 
0436     flags:
0437         These flags indicate which optional fields are present in
0438         this struct. Currently all fields are mandatory.
0439 
0440     data:
0441         Pointer to a buffer with part of the image to write to the
0442         card.
0443 
0444     len_data:
0445         Size of the buffer pointed to by data.
0446 
0447     len_image:
0448         Full size of the image.
0449 
0450 
0451 Sysfs Class
0452 ===========
0453 
0454     A cxl sysfs class is added under /sys/class/cxl to facilitate
0455     enumeration and tuning of the accelerators. Its layout is
0456     described in Documentation/ABI/testing/sysfs-class-cxl
0457 
0458 
0459 Udev rules
0460 ==========
0461 
0462     The following udev rules could be used to create a symlink to the
0463     most logical chardev to use in any programming mode (afuX.Yd for
0464     dedicated, afuX.Ys for afu directed), since the API is virtually
0465     identical for each::
0466 
0467         SUBSYSTEM=="cxl", ATTRS{mode}=="dedicated_process", SYMLINK="cxl/%b"
0468         SUBSYSTEM=="cxl", ATTRS{mode}=="afu_directed", \
0469                           KERNEL=="afu[0-9]*.[0-9]*s", SYMLINK="cxl/%b"