Back to home page

OSCL-LXR

 
 

    


0001 ================================
0002 Coherent Accelerator (CXL) Flash
0003 ================================
0004 
0005 Introduction
0006 ============
0007 
0008     The IBM Power architecture provides support for CAPI (Coherent
0009     Accelerator Power Interface), which is available to certain PCIe slots
0010     on Power 8 systems. CAPI can be thought of as a special tunneling
0011     protocol through PCIe that allow PCIe adapters to look like special
0012     purpose co-processors which can read or write an application's
0013     memory and generate page faults. As a result, the host interface to
0014     an adapter running in CAPI mode does not require the data buffers to
0015     be mapped to the device's memory (IOMMU bypass) nor does it require
0016     memory to be pinned.
0017 
0018     On Linux, Coherent Accelerator (CXL) kernel services present CAPI
0019     devices as a PCI device by implementing a virtual PCI host bridge.
0020     This abstraction simplifies the infrastructure and programming
0021     model, allowing for drivers to look similar to other native PCI
0022     device drivers.
0023 
0024     CXL provides a mechanism by which user space applications can
0025     directly talk to a device (network or storage) bypassing the typical
0026     kernel/device driver stack. The CXL Flash Adapter Driver enables a
0027     user space application direct access to Flash storage.
0028 
0029     The CXL Flash Adapter Driver is a kernel module that sits in the
0030     SCSI stack as a low level device driver (below the SCSI disk and
0031     protocol drivers) for the IBM CXL Flash Adapter. This driver is
0032     responsible for the initialization of the adapter, setting up the
0033     special path for user space access, and performing error recovery. It
0034     communicates directly the Flash Accelerator Functional Unit (AFU)
0035     as described in Documentation/powerpc/cxl.rst.
0036 
0037     The cxlflash driver supports two, mutually exclusive, modes of
0038     operation at the device (LUN) level:
0039 
0040         - Any flash device (LUN) can be configured to be accessed as a
0041           regular disk device (i.e.: /dev/sdc). This is the default mode.
0042 
0043         - Any flash device (LUN) can be configured to be accessed from
0044           user space with a special block library. This mode further
0045           specifies the means of accessing the device and provides for
0046           either raw access to the entire LUN (referred to as direct
0047           or physical LUN access) or access to a kernel/AFU-mediated
0048           partition of the LUN (referred to as virtual LUN access). The
0049           segmentation of a disk device into virtual LUNs is assisted
0050           by special translation services provided by the Flash AFU.
0051 
0052 Overview
0053 ========
0054 
0055     The Coherent Accelerator Interface Architecture (CAIA) introduces a
0056     concept of a master context. A master typically has special privileges
0057     granted to it by the kernel or hypervisor allowing it to perform AFU
0058     wide management and control. The master may or may not be involved
0059     directly in each user I/O, but at the minimum is involved in the
0060     initial setup before the user application is allowed to send requests
0061     directly to the AFU.
0062 
0063     The CXL Flash Adapter Driver establishes a master context with the
0064     AFU. It uses memory mapped I/O (MMIO) for this control and setup. The
0065     Adapter Problem Space Memory Map looks like this::
0066 
0067                      +-------------------------------+
0068                      |    512 * 64 KB User MMIO      |
0069                      |        (per context)          |
0070                      |       User Accessible         |
0071                      +-------------------------------+
0072                      |    512 * 128 B per context    |
0073                      |    Provisioning and Control   |
0074                      |   Trusted Process accessible  |
0075                      +-------------------------------+
0076                      |         64 KB Global          |
0077                      |   Trusted Process accessible  |
0078                      +-------------------------------+
0079 
0080     This driver configures itself into the SCSI software stack as an
0081     adapter driver. The driver is the only entity that is considered a
0082     Trusted Process to program the Provisioning and Control and Global
0083     areas in the MMIO Space shown above.  The master context driver
0084     discovers all LUNs attached to the CXL Flash adapter and instantiates
0085     scsi block devices (/dev/sdb, /dev/sdc etc.) for each unique LUN
0086     seen from each path.
0087 
0088     Once these scsi block devices are instantiated, an application
0089     written to a specification provided by the block library may get
0090     access to the Flash from user space (without requiring a system call).
0091 
0092     This master context driver also provides a series of ioctls for this
0093     block library to enable this user space access.  The driver supports
0094     two modes for accessing the block device.
0095 
0096     The first mode is called a virtual mode. In this mode a single scsi
0097     block device (/dev/sdb) may be carved up into any number of distinct
0098     virtual LUNs. The virtual LUNs may be resized as long as the sum of
0099     the sizes of all the virtual LUNs, along with the meta-data associated
0100     with it does not exceed the physical capacity.
0101 
0102     The second mode is called the physical mode. In this mode a single
0103     block device (/dev/sdb) may be opened directly by the block library
0104     and the entire space for the LUN is available to the application.
0105 
0106     Only the physical mode provides persistence of the data.  i.e. The
0107     data written to the block device will survive application exit and
0108     restart and also reboot. The virtual LUNs do not persist (i.e. do
0109     not survive after the application terminates or the system reboots).
0110 
0111 
0112 Block library API
0113 =================
0114 
0115     Applications intending to get access to the CXL Flash from user
0116     space should use the block library, as it abstracts the details of
0117     interfacing directly with the cxlflash driver that are necessary for
0118     performing administrative actions (i.e.: setup, tear down, resize).
0119     The block library can be thought of as a 'user' of services,
0120     implemented as IOCTLs, that are provided by the cxlflash driver
0121     specifically for devices (LUNs) operating in user space access
0122     mode. While it is not a requirement that applications understand
0123     the interface between the block library and the cxlflash driver,
0124     a high-level overview of each supported service (IOCTL) is provided
0125     below.
0126 
0127     The block library can be found on GitHub:
0128     http://github.com/open-power/capiflash
0129 
0130 
0131 CXL Flash Driver LUN IOCTLs
0132 ===========================
0133 
0134     Users, such as the block library, that wish to interface with a flash
0135     device (LUN) via user space access need to use the services provided
0136     by the cxlflash driver. As these services are implemented as ioctls,
0137     a file descriptor handle must first be obtained in order to establish
0138     the communication channel between a user and the kernel.  This file
0139     descriptor is obtained by opening the device special file associated
0140     with the scsi disk device (/dev/sdb) that was created during LUN
0141     discovery. As per the location of the cxlflash driver within the
0142     SCSI protocol stack, this open is actually not seen by the cxlflash
0143     driver. Upon successful open, the user receives a file descriptor
0144     (herein referred to as fd1) that should be used for issuing the
0145     subsequent ioctls listed below.
0146 
0147     The structure definitions for these IOCTLs are available in:
0148     uapi/scsi/cxlflash_ioctl.h
0149 
0150 DK_CXLFLASH_ATTACH
0151 ------------------
0152 
0153     This ioctl obtains, initializes, and starts a context using the CXL
0154     kernel services. These services specify a context id (u16) by which
0155     to uniquely identify the context and its allocated resources. The
0156     services additionally provide a second file descriptor (herein
0157     referred to as fd2) that is used by the block library to initiate
0158     memory mapped I/O (via mmap()) to the CXL flash device and poll for
0159     completion events. This file descriptor is intentionally installed by
0160     this driver and not the CXL kernel services to allow for intermediary
0161     notification and access in the event of a non-user-initiated close(),
0162     such as a killed process. This design point is described in further
0163     detail in the description for the DK_CXLFLASH_DETACH ioctl.
0164 
0165     There are a few important aspects regarding the "tokens" (context id
0166     and fd2) that are provided back to the user:
0167 
0168         - These tokens are only valid for the process under which they
0169           were created. The child of a forked process cannot continue
0170           to use the context id or file descriptor created by its parent
0171           (see DK_CXLFLASH_VLUN_CLONE for further details).
0172 
0173         - These tokens are only valid for the lifetime of the context and
0174           the process under which they were created. Once either is
0175           destroyed, the tokens are to be considered stale and subsequent
0176           usage will result in errors.
0177 
0178         - A valid adapter file descriptor (fd2 >= 0) is only returned on
0179           the initial attach for a context. Subsequent attaches to an
0180           existing context (DK_CXLFLASH_ATTACH_REUSE_CONTEXT flag present)
0181           do not provide the adapter file descriptor as it was previously
0182           made known to the application.
0183 
0184         - When a context is no longer needed, the user shall detach from
0185           the context via the DK_CXLFLASH_DETACH ioctl. When this ioctl
0186           returns with a valid adapter file descriptor and the return flag
0187           DK_CXLFLASH_APP_CLOSE_ADAP_FD is present, the application _must_
0188           close the adapter file descriptor following a successful detach.
0189 
0190         - When this ioctl returns with a valid fd2 and the return flag
0191           DK_CXLFLASH_APP_CLOSE_ADAP_FD is present, the application _must_
0192           close fd2 in the following circumstances:
0193 
0194                 + Following a successful detach of the last user of the context
0195                 + Following a successful recovery on the context's original fd2
0196                 + In the child process of a fork(), following a clone ioctl,
0197                   on the fd2 associated with the source context
0198 
0199         - At any time, a close on fd2 will invalidate the tokens. Applications
0200           should exercise caution to only close fd2 when appropriate (outlined
0201           in the previous bullet) to avoid premature loss of I/O.
0202 
0203 DK_CXLFLASH_USER_DIRECT
0204 -----------------------
0205     This ioctl is responsible for transitioning the LUN to direct
0206     (physical) mode access and configuring the AFU for direct access from
0207     user space on a per-context basis. Additionally, the block size and
0208     last logical block address (LBA) are returned to the user.
0209 
0210     As mentioned previously, when operating in user space access mode,
0211     LUNs may be accessed in whole or in part. Only one mode is allowed
0212     at a time and if one mode is active (outstanding references exist),
0213     requests to use the LUN in a different mode are denied.
0214 
0215     The AFU is configured for direct access from user space by adding an
0216     entry to the AFU's resource handle table. The index of the entry is
0217     treated as a resource handle that is returned to the user. The user
0218     is then able to use the handle to reference the LUN during I/O.
0219 
0220 DK_CXLFLASH_USER_VIRTUAL
0221 ------------------------
0222     This ioctl is responsible for transitioning the LUN to virtual mode
0223     of access and configuring the AFU for virtual access from user space
0224     on a per-context basis. Additionally, the block size and last logical
0225     block address (LBA) are returned to the user.
0226 
0227     As mentioned previously, when operating in user space access mode,
0228     LUNs may be accessed in whole or in part. Only one mode is allowed
0229     at a time and if one mode is active (outstanding references exist),
0230     requests to use the LUN in a different mode are denied.
0231 
0232     The AFU is configured for virtual access from user space by adding
0233     an entry to the AFU's resource handle table. The index of the entry
0234     is treated as a resource handle that is returned to the user. The
0235     user is then able to use the handle to reference the LUN during I/O.
0236 
0237     By default, the virtual LUN is created with a size of 0. The user
0238     would need to use the DK_CXLFLASH_VLUN_RESIZE ioctl to adjust the grow
0239     the virtual LUN to a desired size. To avoid having to perform this
0240     resize for the initial creation of the virtual LUN, the user has the
0241     option of specifying a size as part of the DK_CXLFLASH_USER_VIRTUAL
0242     ioctl, such that when success is returned to the user, the
0243     resource handle that is provided is already referencing provisioned
0244     storage. This is reflected by the last LBA being a non-zero value.
0245 
0246     When a LUN is accessible from more than one port, this ioctl will
0247     return with the DK_CXLFLASH_ALL_PORTS_ACTIVE return flag set. This
0248     provides the user with a hint that I/O can be retried in the event
0249     of an I/O error as the LUN can be reached over multiple paths.
0250 
0251 DK_CXLFLASH_VLUN_RESIZE
0252 -----------------------
0253     This ioctl is responsible for resizing a previously created virtual
0254     LUN and will fail if invoked upon a LUN that is not in virtual
0255     mode. Upon success, an updated last LBA is returned to the user
0256     indicating the new size of the virtual LUN associated with the
0257     resource handle.
0258 
0259     The partitioning of virtual LUNs is jointly mediated by the cxlflash
0260     driver and the AFU. An allocation table is kept for each LUN that is
0261     operating in the virtual mode and used to program a LUN translation
0262     table that the AFU references when provided with a resource handle.
0263 
0264     This ioctl can return -EAGAIN if an AFU sync operation takes too long.
0265     In addition to returning a failure to user, cxlflash will also schedule
0266     an asynchronous AFU reset. Should the user choose to retry the operation,
0267     it is expected to succeed. If this ioctl fails with -EAGAIN, the user
0268     can either retry the operation or treat it as a failure.
0269 
0270 DK_CXLFLASH_RELEASE
0271 -------------------
0272     This ioctl is responsible for releasing a previously obtained
0273     reference to either a physical or virtual LUN. This can be
0274     thought of as the inverse of the DK_CXLFLASH_USER_DIRECT or
0275     DK_CXLFLASH_USER_VIRTUAL ioctls. Upon success, the resource handle
0276     is no longer valid and the entry in the resource handle table is
0277     made available to be used again.
0278 
0279     As part of the release process for virtual LUNs, the virtual LUN
0280     is first resized to 0 to clear out and free the translation tables
0281     associated with the virtual LUN reference.
0282 
0283 DK_CXLFLASH_DETACH
0284 ------------------
0285     This ioctl is responsible for unregistering a context with the
0286     cxlflash driver and release outstanding resources that were
0287     not explicitly released via the DK_CXLFLASH_RELEASE ioctl. Upon
0288     success, all "tokens" which had been provided to the user from the
0289     DK_CXLFLASH_ATTACH onward are no longer valid.
0290 
0291     When the DK_CXLFLASH_APP_CLOSE_ADAP_FD flag was returned on a successful
0292     attach, the application _must_ close the fd2 associated with the context
0293     following the detach of the final user of the context.
0294 
0295 DK_CXLFLASH_VLUN_CLONE
0296 ----------------------
0297     This ioctl is responsible for cloning a previously created
0298     context to a more recently created context. It exists solely to
0299     support maintaining user space access to storage after a process
0300     forks. Upon success, the child process (which invoked the ioctl)
0301     will have access to the same LUNs via the same resource handle(s)
0302     as the parent, but under a different context.
0303 
0304     Context sharing across processes is not supported with CXL and
0305     therefore each fork must be met with establishing a new context
0306     for the child process. This ioctl simplifies the state management
0307     and playback required by a user in such a scenario. When a process
0308     forks, child process can clone the parents context by first creating
0309     a context (via DK_CXLFLASH_ATTACH) and then using this ioctl to
0310     perform the clone from the parent to the child.
0311 
0312     The clone itself is fairly simple. The resource handle and lun
0313     translation tables are copied from the parent context to the child's
0314     and then synced with the AFU.
0315 
0316     When the DK_CXLFLASH_APP_CLOSE_ADAP_FD flag was returned on a successful
0317     attach, the application _must_ close the fd2 associated with the source
0318     context (still resident/accessible in the parent process) following the
0319     clone. This is to avoid a stale entry in the file descriptor table of the
0320     child process.
0321 
0322     This ioctl can return -EAGAIN if an AFU sync operation takes too long.
0323     In addition to returning a failure to user, cxlflash will also schedule
0324     an asynchronous AFU reset. Should the user choose to retry the operation,
0325     it is expected to succeed. If this ioctl fails with -EAGAIN, the user
0326     can either retry the operation or treat it as a failure.
0327 
0328 DK_CXLFLASH_VERIFY
0329 ------------------
0330     This ioctl is used to detect various changes such as the capacity of
0331     the disk changing, the number of LUNs visible changing, etc. In cases
0332     where the changes affect the application (such as a LUN resize), the
0333     cxlflash driver will report the changed state to the application.
0334 
0335     The user calls in when they want to validate that a LUN hasn't been
0336     changed in response to a check condition. As the user is operating out
0337     of band from the kernel, they will see these types of events without
0338     the kernel's knowledge. When encountered, the user's architected
0339     behavior is to call in to this ioctl, indicating what they want to
0340     verify and passing along any appropriate information. For now, only
0341     verifying a LUN change (ie: size different) with sense data is
0342     supported.
0343 
0344 DK_CXLFLASH_RECOVER_AFU
0345 -----------------------
0346     This ioctl is used to drive recovery (if such an action is warranted)
0347     of a specified user context. Any state associated with the user context
0348     is re-established upon successful recovery.
0349 
0350     User contexts are put into an error condition when the device needs to
0351     be reset or is terminating. Users are notified of this error condition
0352     by seeing all 0xF's on an MMIO read. Upon encountering this, the
0353     architected behavior for a user is to call into this ioctl to recover
0354     their context. A user may also call into this ioctl at any time to
0355     check if the device is operating normally. If a failure is returned
0356     from this ioctl, the user is expected to gracefully clean up their
0357     context via release/detach ioctls. Until they do, the context they
0358     hold is not relinquished. The user may also optionally exit the process
0359     at which time the context/resources they held will be freed as part of
0360     the release fop.
0361 
0362     When the DK_CXLFLASH_APP_CLOSE_ADAP_FD flag was returned on a successful
0363     attach, the application _must_ unmap and close the fd2 associated with the
0364     original context following this ioctl returning success and indicating that
0365     the context was recovered (DK_CXLFLASH_RECOVER_AFU_CONTEXT_RESET).
0366 
0367 DK_CXLFLASH_MANAGE_LUN
0368 ----------------------
0369     This ioctl is used to switch a LUN from a mode where it is available
0370     for file-system access (legacy), to a mode where it is set aside for
0371     exclusive user space access (superpipe). In case a LUN is visible
0372     across multiple ports and adapters, this ioctl is used to uniquely
0373     identify each LUN by its World Wide Node Name (WWNN).
0374 
0375 
0376 CXL Flash Driver Host IOCTLs
0377 ============================
0378 
0379     Each host adapter instance that is supported by the cxlflash driver
0380     has a special character device associated with it to enable a set of
0381     host management function. These character devices are hosted in a
0382     class dedicated for cxlflash and can be accessed via `/dev/cxlflash/*`.
0383 
0384     Applications can be written to perform various functions using the
0385     host ioctl APIs below.
0386 
0387     The structure definitions for these IOCTLs are available in:
0388     uapi/scsi/cxlflash_ioctl.h
0389 
0390 HT_CXLFLASH_LUN_PROVISION
0391 -------------------------
0392     This ioctl is used to create and delete persistent LUNs on cxlflash
0393     devices that lack an external LUN management interface. It is only
0394     valid when used with AFUs that support the LUN provision capability.
0395 
0396     When sufficient space is available, LUNs can be created by specifying
0397     the target port to host the LUN and a desired size in 4K blocks. Upon
0398     success, the LUN ID and WWID of the created LUN will be returned and
0399     the SCSI bus can be scanned to detect the change in LUN topology. Note
0400     that partial allocations are not supported. Should a creation fail due
0401     to a space issue, the target port can be queried for its current LUN
0402     geometry.
0403 
0404     To remove a LUN, the device must first be disassociated from the Linux
0405     SCSI subsystem. The LUN deletion can then be initiated by specifying a
0406     target port and LUN ID. Upon success, the LUN geometry associated with
0407     the port will be updated to reflect new number of provisioned LUNs and
0408     available capacity.
0409 
0410     To query the LUN geometry of a port, the target port is specified and
0411     upon success, the following information is presented:
0412 
0413         - Maximum number of provisioned LUNs allowed for the port
0414         - Current number of provisioned LUNs for the port
0415         - Maximum total capacity of provisioned LUNs for the port (4K blocks)
0416         - Current total capacity of provisioned LUNs for the port (4K blocks)
0417 
0418     With this information, the number of available LUNs and capacity can be
0419     can be calculated.
0420 
0421 HT_CXLFLASH_AFU_DEBUG
0422 ---------------------
0423     This ioctl is used to debug AFUs by supporting a command pass-through
0424     interface. It is only valid when used with AFUs that support the AFU
0425     debug capability.
0426 
0427     With exception of buffer management, AFU debug commands are opaque to
0428     cxlflash and treated as pass-through. For debug commands that do require
0429     data transfer, the user supplies an adequately sized data buffer and must
0430     specify the data transfer direction with respect to the host. There is a
0431     maximum transfer size of 256K imposed. Note that partial read completions
0432     are not supported - when errors are experienced with a host read data
0433     transfer, the data buffer is not copied back to the user.