Back to home page

OSCL-LXR

 
 

    


0001 .. Copyright 2001 Matthew Wilcox
0002 ..
0003 ..     This documentation is free software; you can redistribute
0004 ..     it and/or modify it under the terms of the GNU General Public
0005 ..     License as published by the Free Software Foundation; either
0006 ..     version 2 of the License, or (at your option) any later
0007 ..     version.
0008 
0009 ===============================
0010 Bus-Independent Device Accesses
0011 ===============================
0012 
0013 :Author: Matthew Wilcox
0014 :Author: Alan Cox
0015 
0016 Introduction
0017 ============
0018 
0019 Linux provides an API which abstracts performing IO across all busses
0020 and devices, allowing device drivers to be written independently of bus
0021 type.
0022 
0023 Memory Mapped IO
0024 ================
0025 
0026 Getting Access to the Device
0027 ----------------------------
0028 
0029 The most widely supported form of IO is memory mapped IO. That is, a
0030 part of the CPU's address space is interpreted not as accesses to
0031 memory, but as accesses to a device. Some architectures define devices
0032 to be at a fixed address, but most have some method of discovering
0033 devices. The PCI bus walk is a good example of such a scheme. This
0034 document does not cover how to receive such an address, but assumes you
0035 are starting with one. Physical addresses are of type unsigned long.
0036 
0037 This address should not be used directly. Instead, to get an address
0038 suitable for passing to the accessor functions described below, you
0039 should call ioremap(). An address suitable for accessing
0040 the device will be returned to you.
0041 
0042 After you've finished using the device (say, in your module's exit
0043 routine), call iounmap() in order to return the address
0044 space to the kernel. Most architectures allocate new address space each
0045 time you call ioremap(), and they can run out unless you
0046 call iounmap().
0047 
0048 Accessing the device
0049 --------------------
0050 
0051 The part of the interface most used by drivers is reading and writing
0052 memory-mapped registers on the device. Linux provides interfaces to read
0053 and write 8-bit, 16-bit, 32-bit and 64-bit quantities. Due to a
0054 historical accident, these are named byte, word, long and quad accesses.
0055 Both read and write accesses are supported; there is no prefetch support
0056 at this time.
0057 
0058 The functions are named readb(), readw(), readl(), readq(),
0059 readb_relaxed(), readw_relaxed(), readl_relaxed(), readq_relaxed(),
0060 writeb(), writew(), writel() and writeq().
0061 
0062 Some devices (such as framebuffers) would like to use larger transfers than
0063 8 bytes at a time. For these devices, the memcpy_toio(),
0064 memcpy_fromio() and memset_io() functions are
0065 provided. Do not use memset or memcpy on IO addresses; they are not
0066 guaranteed to copy data in order.
0067 
0068 The read and write functions are defined to be ordered. That is the
0069 compiler is not permitted to reorder the I/O sequence. When the ordering
0070 can be compiler optimised, you can use __readb() and friends to
0071 indicate the relaxed ordering. Use this with care.
0072 
0073 While the basic functions are defined to be synchronous with respect to
0074 each other and ordered with respect to each other the busses the devices
0075 sit on may themselves have asynchronicity. In particular many authors
0076 are burned by the fact that PCI bus writes are posted asynchronously. A
0077 driver author must issue a read from the same device to ensure that
0078 writes have occurred in the specific cases the author cares. This kind
0079 of property cannot be hidden from driver writers in the API. In some
0080 cases, the read used to flush the device may be expected to fail (if the
0081 card is resetting, for example). In that case, the read should be done
0082 from config space, which is guaranteed to soft-fail if the card doesn't
0083 respond.
0084 
0085 The following is an example of flushing a write to a device when the
0086 driver would like to ensure the write's effects are visible prior to
0087 continuing execution::
0088 
0089     static inline void
0090     qla1280_disable_intrs(struct scsi_qla_host *ha)
0091     {
0092         struct device_reg *reg;
0093 
0094         reg = ha->iobase;
0095         /* disable risc and host interrupts */
0096         WRT_REG_WORD(&reg->ictrl, 0);
0097         /*
0098          * The following read will ensure that the above write
0099          * has been received by the device before we return from this
0100          * function.
0101          */
0102         RD_REG_WORD(&reg->ictrl);
0103         ha->flags.ints_enabled = 0;
0104     }
0105 
0106 PCI ordering rules also guarantee that PIO read responses arrive after any
0107 outstanding DMA writes from that bus, since for some devices the result of
0108 a readb() call may signal to the driver that a DMA transaction is
0109 complete. In many cases, however, the driver may want to indicate that the
0110 next readb() call has no relation to any previous DMA writes
0111 performed by the device. The driver can use readb_relaxed() for
0112 these cases, although only some platforms will honor the relaxed
0113 semantics. Using the relaxed read functions will provide significant
0114 performance benefits on platforms that support it. The qla2xxx driver
0115 provides examples of how to use readX_relaxed(). In many cases, a majority
0116 of the driver's readX() calls can safely be converted to readX_relaxed()
0117 calls, since only a few will indicate or depend on DMA completion.
0118 
0119 Port Space Accesses
0120 ===================
0121 
0122 Port Space Explained
0123 --------------------
0124 
0125 Another form of IO commonly supported is Port Space. This is a range of
0126 addresses separate to the normal memory address space. Access to these
0127 addresses is generally not as fast as accesses to the memory mapped
0128 addresses, and it also has a potentially smaller address space.
0129 
0130 Unlike memory mapped IO, no preparation is required to access port
0131 space.
0132 
0133 Accessing Port Space
0134 --------------------
0135 
0136 Accesses to this space are provided through a set of functions which
0137 allow 8-bit, 16-bit and 32-bit accesses; also known as byte, word and
0138 long. These functions are inb(), inw(),
0139 inl(), outb(), outw() and
0140 outl().
0141 
0142 Some variants are provided for these functions. Some devices require
0143 that accesses to their ports are slowed down. This functionality is
0144 provided by appending a ``_p`` to the end of the function.
0145 There are also equivalents to memcpy. The ins() and
0146 outs() functions copy bytes, words or longs to the given
0147 port.
0148 
0149 __iomem pointer tokens
0150 ======================
0151 
0152 The data type for an MMIO address is an ``__iomem`` qualified pointer, such as
0153 ``void __iomem *reg``. On most architectures it is a regular pointer that
0154 points to a virtual memory address and can be offset or dereferenced, but in
0155 portable code, it must only be passed from and to functions that explicitly
0156 operated on an ``__iomem`` token, in particular the ioremap() and
0157 readl()/writel() functions. The 'sparse' semantic code checker can be used to
0158 verify that this is done correctly.
0159 
0160 While on most architectures, ioremap() creates a page table entry for an
0161 uncached virtual address pointing to the physical MMIO address, some
0162 architectures require special instructions for MMIO, and the ``__iomem`` pointer
0163 just encodes the physical address or an offsettable cookie that is interpreted
0164 by readl()/writel().
0165 
0166 Differences between I/O access functions
0167 ========================================
0168 
0169 readq(), readl(), readw(), readb(), writeq(), writel(), writew(), writeb()
0170 
0171   These are the most generic accessors, providing serialization against other
0172   MMIO accesses and DMA accesses as well as fixed endianness for accessing
0173   little-endian PCI devices and on-chip peripherals. Portable device drivers
0174   should generally use these for any access to ``__iomem`` pointers.
0175 
0176   Note that posted writes are not strictly ordered against a spinlock, see
0177   Documentation/driver-api/io_ordering.rst.
0178 
0179 readq_relaxed(), readl_relaxed(), readw_relaxed(), readb_relaxed(),
0180 writeq_relaxed(), writel_relaxed(), writew_relaxed(), writeb_relaxed()
0181 
0182   On architectures that require an expensive barrier for serializing against
0183   DMA, these "relaxed" versions of the MMIO accessors only serialize against
0184   each other, but contain a less expensive barrier operation. A device driver
0185   might use these in a particularly performance sensitive fast path, with a
0186   comment that explains why the usage in a specific location is safe without
0187   the extra barriers.
0188 
0189   See memory-barriers.txt for a more detailed discussion on the precise ordering
0190   guarantees of the non-relaxed and relaxed versions.
0191 
0192 ioread64(), ioread32(), ioread16(), ioread8(),
0193 iowrite64(), iowrite32(), iowrite16(), iowrite8()
0194 
0195   These are an alternative to the normal readl()/writel() functions, with almost
0196   identical behavior, but they can also operate on ``__iomem`` tokens returned
0197   for mapping PCI I/O space with pci_iomap() or ioport_map(). On architectures
0198   that require special instructions for I/O port access, this adds a small
0199   overhead for an indirect function call implemented in lib/iomap.c, while on
0200   other architectures, these are simply aliases.
0201 
0202 ioread64be(), ioread32be(), ioread16be()
0203 iowrite64be(), iowrite32be(), iowrite16be()
0204 
0205   These behave in the same way as the ioread32()/iowrite32() family, but with
0206   reversed byte order, for accessing devices with big-endian MMIO registers.
0207   Device drivers that can operate on either big-endian or little-endian
0208   registers may have to implement a custom wrapper function that picks one or
0209   the other depending on which device was found.
0210 
0211   Note: On some architectures, the normal readl()/writel() functions
0212   traditionally assume that devices are the same endianness as the CPU, while
0213   using a hardware byte-reverse on the PCI bus when running a big-endian kernel.
0214   Drivers that use readl()/writel() this way are generally not portable, but
0215   tend to be limited to a particular SoC.
0216 
0217 hi_lo_readq(), lo_hi_readq(), hi_lo_readq_relaxed(), lo_hi_readq_relaxed(),
0218 ioread64_lo_hi(), ioread64_hi_lo(), ioread64be_lo_hi(), ioread64be_hi_lo(),
0219 hi_lo_writeq(), lo_hi_writeq(), hi_lo_writeq_relaxed(), lo_hi_writeq_relaxed(),
0220 iowrite64_lo_hi(), iowrite64_hi_lo(), iowrite64be_lo_hi(), iowrite64be_hi_lo()
0221 
0222   Some device drivers have 64-bit registers that cannot be accessed atomically
0223   on 32-bit architectures but allow two consecutive 32-bit accesses instead.
0224   Since it depends on the particular device which of the two halves has to be
0225   accessed first, a helper is provided for each combination of 64-bit accessors
0226   with either low/high or high/low word ordering. A device driver must include
0227   either <linux/io-64-nonatomic-lo-hi.h> or <linux/io-64-nonatomic-hi-lo.h> to
0228   get the function definitions along with helpers that redirect the normal
0229   readq()/writeq() to them on architectures that do not provide 64-bit access
0230   natively.
0231 
0232 __raw_readq(), __raw_readl(), __raw_readw(), __raw_readb(),
0233 __raw_writeq(), __raw_writel(), __raw_writew(), __raw_writeb()
0234 
0235   These are low-level MMIO accessors without barriers or byteorder changes and
0236   architecture specific behavior. Accesses are usually atomic in the sense that
0237   a four-byte __raw_readl() does not get split into individual byte loads, but
0238   multiple consecutive accesses can be combined on the bus. In portable code, it
0239   is only safe to use these to access memory behind a device bus but not MMIO
0240   registers, as there are no ordering guarantees with regard to other MMIO
0241   accesses or even spinlocks. The byte order is generally the same as for normal
0242   memory, so unlike the other functions, these can be used to copy data between
0243   kernel memory and device memory.
0244 
0245 inl(), inw(), inb(), outl(), outw(), outb()
0246 
0247   PCI I/O port resources traditionally require separate helpers as they are
0248   implemented using special instructions on the x86 architecture. On most other
0249   architectures, these are mapped to readl()/writel() style accessors
0250   internally, usually pointing to a fixed area in virtual memory. Instead of an
0251   ``__iomem`` pointer, the address is a 32-bit integer token to identify a port
0252   number. PCI requires I/O port access to be non-posted, meaning that an outb()
0253   must complete before the following code executes, while a normal writeb() may
0254   still be in progress. On architectures that correctly implement this, I/O port
0255   access is therefore ordered against spinlocks. Many non-x86 PCI host bridge
0256   implementations and CPU architectures however fail to implement non-posted I/O
0257   space on PCI, so they can end up being posted on such hardware.
0258 
0259   In some architectures, the I/O port number space has a 1:1 mapping to
0260   ``__iomem`` pointers, but this is not recommended and device drivers should
0261   not rely on that for portability. Similarly, an I/O port number as described
0262   in a PCI base address register may not correspond to the port number as seen
0263   by a device driver. Portable drivers need to read the port number for the
0264   resource provided by the kernel.
0265 
0266   There are no direct 64-bit I/O port accessors, but pci_iomap() in combination
0267   with ioread64/iowrite64 can be used instead.
0268 
0269 inl_p(), inw_p(), inb_p(), outl_p(), outw_p(), outb_p()
0270 
0271   On ISA devices that require specific timing, the _p versions of the I/O
0272   accessors add a small delay. On architectures that do not have ISA buses,
0273   these are aliases to the normal inb/outb helpers.
0274 
0275 readsq, readsl, readsw, readsb
0276 writesq, writesl, writesw, writesb
0277 ioread64_rep, ioread32_rep, ioread16_rep, ioread8_rep
0278 iowrite64_rep, iowrite32_rep, iowrite16_rep, iowrite8_rep
0279 insl, insw, insb, outsl, outsw, outsb
0280 
0281   These are helpers that access the same address multiple times, usually to copy
0282   data between kernel memory byte stream and a FIFO buffer. Unlike the normal
0283   MMIO accessors, these do not perform a byteswap on big-endian kernels, so the
0284   first byte in the FIFO register corresponds to the first byte in the memory
0285   buffer regardless of the architecture.
0286 
0287 Device memory mapping modes
0288 ===========================
0289 
0290 Some architectures support multiple modes for mapping device memory.
0291 ioremap_*() variants provide a common abstraction around these
0292 architecture-specific modes, with a shared set of semantics.
0293 
0294 ioremap() is the most common mapping type, and is applicable to typical device
0295 memory (e.g. I/O registers). Other modes can offer weaker or stronger
0296 guarantees, if supported by the architecture. From most to least common, they
0297 are as follows:
0298 
0299 ioremap()
0300 ---------
0301 
0302 The default mode, suitable for most memory-mapped devices, e.g. control
0303 registers. Memory mapped using ioremap() has the following characteristics:
0304 
0305 * Uncached - CPU-side caches are bypassed, and all reads and writes are handled
0306   directly by the device
0307 * No speculative operations - the CPU may not issue a read or write to this
0308   memory, unless the instruction that does so has been reached in committed
0309   program flow.
0310 * No reordering - The CPU may not reorder accesses to this memory mapping with
0311   respect to each other. On some architectures, this relies on barriers in
0312   readl_relaxed()/writel_relaxed().
0313 * No repetition - The CPU may not issue multiple reads or writes for a single
0314   program instruction.
0315 * No write-combining - Each I/O operation results in one discrete read or write
0316   being issued to the device, and multiple writes are not combined into larger
0317   writes. This may or may not be enforced when using __raw I/O accessors or
0318   pointer dereferences.
0319 * Non-executable - The CPU is not allowed to speculate instruction execution
0320   from this memory (it probably goes without saying, but you're also not
0321   allowed to jump into device memory).
0322 
0323 On many platforms and buses (e.g. PCI), writes issued through ioremap()
0324 mappings are posted, which means that the CPU does not wait for the write to
0325 actually reach the target device before retiring the write instruction.
0326 
0327 On many platforms, I/O accesses must be aligned with respect to the access
0328 size; failure to do so will result in an exception or unpredictable results.
0329 
0330 ioremap_wc()
0331 ------------
0332 
0333 Maps I/O memory as normal memory with write combining. Unlike ioremap(),
0334 
0335 * The CPU may speculatively issue reads from the device that the program
0336   didn't actually execute, and may choose to basically read whatever it wants.
0337 * The CPU may reorder operations as long as the result is consistent from the
0338   program's point of view.
0339 * The CPU may write to the same location multiple times, even when the program
0340   issued a single write.
0341 * The CPU may combine several writes into a single larger write.
0342 
0343 This mode is typically used for video framebuffers, where it can increase
0344 performance of writes. It can also be used for other blocks of memory in
0345 devices (e.g. buffers or shared memory), but care must be taken as accesses are
0346 not guaranteed to be ordered with respect to normal ioremap() MMIO register
0347 accesses without explicit barriers.
0348 
0349 On a PCI bus, it is usually safe to use ioremap_wc() on MMIO areas marked as
0350 ``IORESOURCE_PREFETCH``, but it may not be used on those without the flag.
0351 For on-chip devices, there is no corresponding flag, but a driver can use
0352 ioremap_wc() on a device that is known to be safe.
0353 
0354 ioremap_wt()
0355 ------------
0356 
0357 Maps I/O memory as normal memory with write-through caching. Like ioremap_wc(),
0358 but also,
0359 
0360 * The CPU may cache writes issued to and reads from the device, and serve reads
0361   from that cache.
0362 
0363 This mode is sometimes used for video framebuffers, where drivers still expect
0364 writes to reach the device in a timely manner (and not be stuck in the CPU
0365 cache), but reads may be served from the cache for efficiency. However, it is
0366 rarely useful these days, as framebuffer drivers usually perform writes only,
0367 for which ioremap_wc() is more efficient (as it doesn't needlessly trash the
0368 cache). Most drivers should not use this.
0369 
0370 ioremap_np()
0371 ------------
0372 
0373 Like ioremap(), but explicitly requests non-posted write semantics. On some
0374 architectures and buses, ioremap() mappings have posted write semantics, which
0375 means that writes can appear to "complete" from the point of view of the
0376 CPU before the written data actually arrives at the target device. Writes are
0377 still ordered with respect to other writes and reads from the same device, but
0378 due to the posted write semantics, this is not the case with respect to other
0379 devices. ioremap_np() explicitly requests non-posted semantics, which means
0380 that the write instruction will not appear to complete until the device has
0381 received (and to some platform-specific extent acknowledged) the written data.
0382 
0383 This mapping mode primarily exists to cater for platforms with bus fabrics that
0384 require this particular mapping mode to work correctly. These platforms set the
0385 ``IORESOURCE_MEM_NONPOSTED`` flag for a resource that requires ioremap_np()
0386 semantics and portable drivers should use an abstraction that automatically
0387 selects it where appropriate (see the `Higher-level ioremap abstractions`_
0388 section below).
0389 
0390 The bare ioremap_np() is only available on some architectures; on others, it
0391 always returns NULL. Drivers should not normally use it, unless they are
0392 platform-specific or they derive benefit from non-posted writes where
0393 supported, and can fall back to ioremap() otherwise. The normal approach to
0394 ensure posted write completion is to do a dummy read after a write as
0395 explained in `Accessing the device`_, which works with ioremap() on all
0396 platforms.
0397 
0398 ioremap_np() should never be used for PCI drivers. PCI memory space writes are
0399 always posted, even on architectures that otherwise implement ioremap_np().
0400 Using ioremap_np() for PCI BARs will at best result in posted write semantics,
0401 and at worst result in complete breakage.
0402 
0403 Note that non-posted write semantics are orthogonal to CPU-side ordering
0404 guarantees. A CPU may still choose to issue other reads or writes before a
0405 non-posted write instruction retires. See the previous section on MMIO access
0406 functions for details on the CPU side of things.
0407 
0408 ioremap_uc()
0409 ------------
0410 
0411 ioremap_uc() behaves like ioremap() except that on the x86 architecture without
0412 'PAT' mode, it marks memory as uncached even when the MTRR has designated
0413 it as cacheable, see Documentation/x86/pat.rst.
0414 
0415 Portable drivers should avoid the use of ioremap_uc().
0416 
0417 ioremap_cache()
0418 ---------------
0419 
0420 ioremap_cache() effectively maps I/O memory as normal RAM. CPU write-back
0421 caches can be used, and the CPU is free to treat the device as if it were a
0422 block of RAM. This should never be used for device memory which has side
0423 effects of any kind, or which does not return the data previously written on
0424 read.
0425 
0426 It should also not be used for actual RAM, as the returned pointer is an
0427 ``__iomem`` token. memremap() can be used for mapping normal RAM that is outside
0428 of the linear kernel memory area to a regular pointer.
0429 
0430 Portable drivers should avoid the use of ioremap_cache().
0431 
0432 Architecture example
0433 --------------------
0434 
0435 Here is how the above modes map to memory attribute settings on the ARM64
0436 architecture:
0437 
0438 +------------------------+--------------------------------------------+
0439 | API                    | Memory region type and cacheability        |
0440 +------------------------+--------------------------------------------+
0441 | ioremap_np()           | Device-nGnRnE                              |
0442 +------------------------+--------------------------------------------+
0443 | ioremap()              | Device-nGnRE                               |
0444 +------------------------+--------------------------------------------+
0445 | ioremap_uc()           | (not implemented)                          |
0446 +------------------------+--------------------------------------------+
0447 | ioremap_wc()           | Normal-Non Cacheable                       |
0448 +------------------------+--------------------------------------------+
0449 | ioremap_wt()           | (not implemented; fallback to ioremap)     |
0450 +------------------------+--------------------------------------------+
0451 | ioremap_cache()        | Normal-Write-Back Cacheable                |
0452 +------------------------+--------------------------------------------+
0453 
0454 Higher-level ioremap abstractions
0455 =================================
0456 
0457 Instead of using the above raw ioremap() modes, drivers are encouraged to use
0458 higher-level APIs. These APIs may implement platform-specific logic to
0459 automatically choose an appropriate ioremap mode on any given bus, allowing for
0460 a platform-agnostic driver to work on those platforms without any special
0461 cases. At the time of this writing, the following ioremap() wrappers have such
0462 logic:
0463 
0464 devm_ioremap_resource()
0465 
0466   Can automatically select ioremap_np() over ioremap() according to platform
0467   requirements, if the ``IORESOURCE_MEM_NONPOSTED`` flag is set on the struct
0468   resource. Uses devres to automatically unmap the resource when the driver
0469   probe() function fails or a device in unbound from its driver.
0470 
0471   Documented in Documentation/driver-api/driver-model/devres.rst.
0472 
0473 of_address_to_resource()
0474 
0475   Automatically sets the ``IORESOURCE_MEM_NONPOSTED`` flag for platforms that
0476   require non-posted writes for certain buses (see the nonposted-mmio and
0477   posted-mmio device tree properties).
0478 
0479 of_iomap()
0480 
0481   Maps the resource described in a ``reg`` property in the device tree, doing
0482   all required translations. Automatically selects ioremap_np() according to
0483   platform requirements, as above.
0484 
0485 pci_ioremap_bar(), pci_ioremap_wc_bar()
0486 
0487   Maps the resource described in a PCI base address without having to extract
0488   the physical address first.
0489 
0490 pci_iomap(), pci_iomap_wc()
0491 
0492   Like pci_ioremap_bar()/pci_ioremap_bar(), but also works on I/O space when
0493   used together with ioread32()/iowrite32() and similar accessors
0494 
0495 pcim_iomap()
0496 
0497   Like pci_iomap(), but uses devres to automatically unmap the resource when
0498   the driver probe() function fails or a device in unbound from its driver
0499 
0500   Documented in Documentation/driver-api/driver-model/devres.rst.
0501 
0502 Not using these wrappers may make drivers unusable on certain platforms with
0503 stricter rules for mapping I/O memory.
0504 
0505 Generalizing Access to System and I/O Memory
0506 ============================================
0507 
0508 .. kernel-doc:: include/linux/iosys-map.h
0509    :doc: overview
0510 
0511 .. kernel-doc:: include/linux/iosys-map.h
0512    :internal:
0513 
0514 Public Functions Provided
0515 =========================
0516 
0517 .. kernel-doc:: arch/x86/include/asm/io.h
0518    :internal:
0519 
0520 .. kernel-doc:: lib/pci_iomap.c
0521    :export: