Back to home page

LXR

 
 

    


0001                      Dynamic DMA mapping Guide
0002                      =========================
0003 
0004                  David S. Miller <davem@redhat.com>
0005                  Richard Henderson <rth@cygnus.com>
0006                   Jakub Jelinek <jakub@redhat.com>
0007 
0008 This is a guide to device driver writers on how to use the DMA API
0009 with example pseudo-code.  For a concise description of the API, see
0010 DMA-API.txt.
0011 
0012                        CPU and DMA addresses
0013 
0014 There are several kinds of addresses involved in the DMA API, and it's
0015 important to understand the differences.
0016 
0017 The kernel normally uses virtual addresses.  Any address returned by
0018 kmalloc(), vmalloc(), and similar interfaces is a virtual address and can
0019 be stored in a "void *".
0020 
0021 The virtual memory system (TLB, page tables, etc.) translates virtual
0022 addresses to CPU physical addresses, which are stored as "phys_addr_t" or
0023 "resource_size_t".  The kernel manages device resources like registers as
0024 physical addresses.  These are the addresses in /proc/iomem.  The physical
0025 address is not directly useful to a driver; it must use ioremap() to map
0026 the space and produce a virtual address.
0027 
0028 I/O devices use a third kind of address: a "bus address".  If a device has
0029 registers at an MMIO address, or if it performs DMA to read or write system
0030 memory, the addresses used by the device are bus addresses.  In some
0031 systems, bus addresses are identical to CPU physical addresses, but in
0032 general they are not.  IOMMUs and host bridges can produce arbitrary
0033 mappings between physical and bus addresses.
0034 
0035 From a device's point of view, DMA uses the bus address space, but it may
0036 be restricted to a subset of that space.  For example, even if a system
0037 supports 64-bit addresses for main memory and PCI BARs, it may use an IOMMU
0038 so devices only need to use 32-bit DMA addresses.
0039 
0040 Here's a picture and some examples:
0041 
0042                CPU                  CPU                  Bus
0043              Virtual              Physical             Address
0044              Address              Address               Space
0045               Space                Space
0046 
0047             +-------+             +------+             +------+
0048             |       |             |MMIO  |   Offset    |      |
0049             |       |  Virtual    |Space |   applied   |      |
0050           C +-------+ --------> B +------+ ----------> +------+ A
0051             |       |  mapping    |      |   by host   |      |
0052   +-----+   |       |             |      |   bridge    |      |   +--------+
0053   |     |   |       |             +------+             |      |   |        |
0054   | CPU |   |       |             | RAM  |             |      |   | Device |
0055   |     |   |       |             |      |             |      |   |        |
0056   +-----+   +-------+             +------+             +------+   +--------+
0057             |       |  Virtual    |Buffer|   Mapping   |      |
0058           X +-------+ --------> Y +------+ <---------- +------+ Z
0059             |       |  mapping    | RAM  |   by IOMMU
0060             |       |             |      |
0061             |       |             |      |
0062             +-------+             +------+
0063 
0064 During the enumeration process, the kernel learns about I/O devices and
0065 their MMIO space and the host bridges that connect them to the system.  For
0066 example, if a PCI device has a BAR, the kernel reads the bus address (A)
0067 from the BAR and converts it to a CPU physical address (B).  The address B
0068 is stored in a struct resource and usually exposed via /proc/iomem.  When a
0069 driver claims a device, it typically uses ioremap() to map physical address
0070 B at a virtual address (C).  It can then use, e.g., ioread32(C), to access
0071 the device registers at bus address A.
0072 
0073 If the device supports DMA, the driver sets up a buffer using kmalloc() or
0074 a similar interface, which returns a virtual address (X).  The virtual
0075 memory system maps X to a physical address (Y) in system RAM.  The driver
0076 can use virtual address X to access the buffer, but the device itself
0077 cannot because DMA doesn't go through the CPU virtual memory system.
0078 
0079 In some simple systems, the device can do DMA directly to physical address
0080 Y.  But in many others, there is IOMMU hardware that translates DMA
0081 addresses to physical addresses, e.g., it translates Z to Y.  This is part
0082 of the reason for the DMA API: the driver can give a virtual address X to
0083 an interface like dma_map_single(), which sets up any required IOMMU
0084 mapping and returns the DMA address Z.  The driver then tells the device to
0085 do DMA to Z, and the IOMMU maps it to the buffer at address Y in system
0086 RAM.
0087 
0088 So that Linux can use the dynamic DMA mapping, it needs some help from the
0089 drivers, namely it has to take into account that DMA addresses should be
0090 mapped only for the time they are actually used and unmapped after the DMA
0091 transfer.
0092 
0093 The following API will work of course even on platforms where no such
0094 hardware exists.
0095 
0096 Note that the DMA API works with any bus independent of the underlying
0097 microprocessor architecture. You should use the DMA API rather than the
0098 bus-specific DMA API, i.e., use the dma_map_*() interfaces rather than the
0099 pci_map_*() interfaces.
0100 
0101 First of all, you should make sure
0102 
0103 #include <linux/dma-mapping.h>
0104 
0105 is in your driver, which provides the definition of dma_addr_t.  This type
0106 can hold any valid DMA address for the platform and should be used
0107 everywhere you hold a DMA address returned from the DMA mapping functions.
0108 
0109                          What memory is DMA'able?
0110 
0111 The first piece of information you must know is what kernel memory can
0112 be used with the DMA mapping facilities.  There has been an unwritten
0113 set of rules regarding this, and this text is an attempt to finally
0114 write them down.
0115 
0116 If you acquired your memory via the page allocator
0117 (i.e. __get_free_page*()) or the generic memory allocators
0118 (i.e. kmalloc() or kmem_cache_alloc()) then you may DMA to/from
0119 that memory using the addresses returned from those routines.
0120 
0121 This means specifically that you may _not_ use the memory/addresses
0122 returned from vmalloc() for DMA.  It is possible to DMA to the
0123 _underlying_ memory mapped into a vmalloc() area, but this requires
0124 walking page tables to get the physical addresses, and then
0125 translating each of those pages back to a kernel address using
0126 something like __va().  [ EDIT: Update this when we integrate
0127 Gerd Knorr's generic code which does this. ]
0128 
0129 This rule also means that you may use neither kernel image addresses
0130 (items in data/text/bss segments), nor module image addresses, nor
0131 stack addresses for DMA.  These could all be mapped somewhere entirely
0132 different than the rest of physical memory.  Even if those classes of
0133 memory could physically work with DMA, you'd need to ensure the I/O
0134 buffers were cacheline-aligned.  Without that, you'd see cacheline
0135 sharing problems (data corruption) on CPUs with DMA-incoherent caches.
0136 (The CPU could write to one word, DMA would write to a different one
0137 in the same cache line, and one of them could be overwritten.)
0138 
0139 Also, this means that you cannot take the return of a kmap()
0140 call and DMA to/from that.  This is similar to vmalloc().
0141 
0142 What about block I/O and networking buffers?  The block I/O and
0143 networking subsystems make sure that the buffers they use are valid
0144 for you to DMA from/to.
0145 
0146                         DMA addressing limitations
0147 
0148 Does your device have any DMA addressing limitations?  For example, is
0149 your device only capable of driving the low order 24-bits of address?
0150 If so, you need to inform the kernel of this fact.
0151 
0152 By default, the kernel assumes that your device can address the full
0153 32-bits.  For a 64-bit capable device, this needs to be increased.
0154 And for a device with limitations, as discussed in the previous
0155 paragraph, it needs to be decreased.
0156 
0157 Special note about PCI: PCI-X specification requires PCI-X devices to
0158 support 64-bit addressing (DAC) for all transactions.  And at least
0159 one platform (SGI SN2) requires 64-bit consistent allocations to
0160 operate correctly when the IO bus is in PCI-X mode.
0161 
0162 For correct operation, you must interrogate the kernel in your device
0163 probe routine to see if the DMA controller on the machine can properly
0164 support the DMA addressing limitation your device has.  It is good
0165 style to do this even if your device holds the default setting,
0166 because this shows that you did think about these issues wrt. your
0167 device.
0168 
0169 The query is performed via a call to dma_set_mask_and_coherent():
0170 
0171         int dma_set_mask_and_coherent(struct device *dev, u64 mask);
0172 
0173 which will query the mask for both streaming and coherent APIs together.
0174 If you have some special requirements, then the following two separate
0175 queries can be used instead:
0176 
0177         The query for streaming mappings is performed via a call to
0178         dma_set_mask():
0179 
0180                 int dma_set_mask(struct device *dev, u64 mask);
0181 
0182         The query for consistent allocations is performed via a call
0183         to dma_set_coherent_mask():
0184 
0185                 int dma_set_coherent_mask(struct device *dev, u64 mask);
0186 
0187 Here, dev is a pointer to the device struct of your device, and mask
0188 is a bit mask describing which bits of an address your device
0189 supports.  It returns zero if your card can perform DMA properly on
0190 the machine given the address mask you provided.  In general, the
0191 device struct of your device is embedded in the bus-specific device
0192 struct of your device.  For example, &pdev->dev is a pointer to the
0193 device struct of a PCI device (pdev is a pointer to the PCI device
0194 struct of your device).
0195 
0196 If it returns non-zero, your device cannot perform DMA properly on
0197 this platform, and attempting to do so will result in undefined
0198 behavior.  You must either use a different mask, or not use DMA.
0199 
0200 This means that in the failure case, you have three options:
0201 
0202 1) Use another DMA mask, if possible (see below).
0203 2) Use some non-DMA mode for data transfer, if possible.
0204 3) Ignore this device and do not initialize it.
0205 
0206 It is recommended that your driver print a kernel KERN_WARNING message
0207 when you end up performing either #2 or #3.  In this manner, if a user
0208 of your driver reports that performance is bad or that the device is not
0209 even detected, you can ask them for the kernel messages to find out
0210 exactly why.
0211 
0212 The standard 32-bit addressing device would do something like this:
0213 
0214         if (dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32))) {
0215                 dev_warn(dev, "mydev: No suitable DMA available\n");
0216                 goto ignore_this_device;
0217         }
0218 
0219 Another common scenario is a 64-bit capable device.  The approach here
0220 is to try for 64-bit addressing, but back down to a 32-bit mask that
0221 should not fail.  The kernel may fail the 64-bit mask not because the
0222 platform is not capable of 64-bit addressing.  Rather, it may fail in
0223 this case simply because 32-bit addressing is done more efficiently
0224 than 64-bit addressing.  For example, Sparc64 PCI SAC addressing is
0225 more efficient than DAC addressing.
0226 
0227 Here is how you would handle a 64-bit capable device which can drive
0228 all 64-bits when accessing streaming DMA:
0229 
0230         int using_dac;
0231 
0232         if (!dma_set_mask(dev, DMA_BIT_MASK(64))) {
0233                 using_dac = 1;
0234         } else if (!dma_set_mask(dev, DMA_BIT_MASK(32))) {
0235                 using_dac = 0;
0236         } else {
0237                 dev_warn(dev, "mydev: No suitable DMA available\n");
0238                 goto ignore_this_device;
0239         }
0240 
0241 If a card is capable of using 64-bit consistent allocations as well,
0242 the case would look like this:
0243 
0244         int using_dac, consistent_using_dac;
0245 
0246         if (!dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64))) {
0247                 using_dac = 1;
0248                 consistent_using_dac = 1;
0249         } else if (!dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32))) {
0250                 using_dac = 0;
0251                 consistent_using_dac = 0;
0252         } else {
0253                 dev_warn(dev, "mydev: No suitable DMA available\n");
0254                 goto ignore_this_device;
0255         }
0256 
0257 The coherent mask will always be able to set the same or a smaller mask as
0258 the streaming mask. However for the rare case that a device driver only
0259 uses consistent allocations, one would have to check the return value from
0260 dma_set_coherent_mask().
0261 
0262 Finally, if your device can only drive the low 24-bits of
0263 address you might do something like:
0264 
0265         if (dma_set_mask(dev, DMA_BIT_MASK(24))) {
0266                 dev_warn(dev, "mydev: 24-bit DMA addressing not available\n");
0267                 goto ignore_this_device;
0268         }
0269 
0270 When dma_set_mask() or dma_set_mask_and_coherent() is successful, and
0271 returns zero, the kernel saves away this mask you have provided.  The
0272 kernel will use this information later when you make DMA mappings.
0273 
0274 There is a case which we are aware of at this time, which is worth
0275 mentioning in this documentation.  If your device supports multiple
0276 functions (for example a sound card provides playback and record
0277 functions) and the various different functions have _different_
0278 DMA addressing limitations, you may wish to probe each mask and
0279 only provide the functionality which the machine can handle.  It
0280 is important that the last call to dma_set_mask() be for the
0281 most specific mask.
0282 
0283 Here is pseudo-code showing how this might be done:
0284 
0285         #define PLAYBACK_ADDRESS_BITS   DMA_BIT_MASK(32)
0286         #define RECORD_ADDRESS_BITS     DMA_BIT_MASK(24)
0287 
0288         struct my_sound_card *card;
0289         struct device *dev;
0290 
0291         ...
0292         if (!dma_set_mask(dev, PLAYBACK_ADDRESS_BITS)) {
0293                 card->playback_enabled = 1;
0294         } else {
0295                 card->playback_enabled = 0;
0296                 dev_warn(dev, "%s: Playback disabled due to DMA limitations\n",
0297                        card->name);
0298         }
0299         if (!dma_set_mask(dev, RECORD_ADDRESS_BITS)) {
0300                 card->record_enabled = 1;
0301         } else {
0302                 card->record_enabled = 0;
0303                 dev_warn(dev, "%s: Record disabled due to DMA limitations\n",
0304                        card->name);
0305         }
0306 
0307 A sound card was used as an example here because this genre of PCI
0308 devices seems to be littered with ISA chips given a PCI front end,
0309 and thus retaining the 16MB DMA addressing limitations of ISA.
0310 
0311                         Types of DMA mappings
0312 
0313 There are two types of DMA mappings:
0314 
0315 - Consistent DMA mappings which are usually mapped at driver
0316   initialization, unmapped at the end and for which the hardware should
0317   guarantee that the device and the CPU can access the data
0318   in parallel and will see updates made by each other without any
0319   explicit software flushing.
0320 
0321   Think of "consistent" as "synchronous" or "coherent".
0322 
0323   The current default is to return consistent memory in the low 32
0324   bits of the DMA space.  However, for future compatibility you should
0325   set the consistent mask even if this default is fine for your
0326   driver.
0327 
0328   Good examples of what to use consistent mappings for are:
0329 
0330         - Network card DMA ring descriptors.
0331         - SCSI adapter mailbox command data structures.
0332         - Device firmware microcode executed out of
0333           main memory.
0334 
0335   The invariant these examples all require is that any CPU store
0336   to memory is immediately visible to the device, and vice
0337   versa.  Consistent mappings guarantee this.
0338 
0339   IMPORTANT: Consistent DMA memory does not preclude the usage of
0340              proper memory barriers.  The CPU may reorder stores to
0341              consistent memory just as it may normal memory.  Example:
0342              if it is important for the device to see the first word
0343              of a descriptor updated before the second, you must do
0344              something like:
0345 
0346                 desc->word0 = address;
0347                 wmb();
0348                 desc->word1 = DESC_VALID;
0349 
0350              in order to get correct behavior on all platforms.
0351 
0352              Also, on some platforms your driver may need to flush CPU write
0353              buffers in much the same way as it needs to flush write buffers
0354              found in PCI bridges (such as by reading a register's value
0355              after writing it).
0356 
0357 - Streaming DMA mappings which are usually mapped for one DMA
0358   transfer, unmapped right after it (unless you use dma_sync_* below)
0359   and for which hardware can optimize for sequential accesses.
0360 
0361   Think of "streaming" as "asynchronous" or "outside the coherency
0362   domain".
0363 
0364   Good examples of what to use streaming mappings for are:
0365 
0366         - Networking buffers transmitted/received by a device.
0367         - Filesystem buffers written/read by a SCSI device.
0368 
0369   The interfaces for using this type of mapping were designed in
0370   such a way that an implementation can make whatever performance
0371   optimizations the hardware allows.  To this end, when using
0372   such mappings you must be explicit about what you want to happen.
0373 
0374 Neither type of DMA mapping has alignment restrictions that come from
0375 the underlying bus, although some devices may have such restrictions.
0376 Also, systems with caches that aren't DMA-coherent will work better
0377 when the underlying buffers don't share cache lines with other data.
0378 
0379 
0380                  Using Consistent DMA mappings.
0381 
0382 To allocate and map large (PAGE_SIZE or so) consistent DMA regions,
0383 you should do:
0384 
0385         dma_addr_t dma_handle;
0386 
0387         cpu_addr = dma_alloc_coherent(dev, size, &dma_handle, gfp);
0388 
0389 where device is a struct device *. This may be called in interrupt
0390 context with the GFP_ATOMIC flag.
0391 
0392 Size is the length of the region you want to allocate, in bytes.
0393 
0394 This routine will allocate RAM for that region, so it acts similarly to
0395 __get_free_pages() (but takes size instead of a page order).  If your
0396 driver needs regions sized smaller than a page, you may prefer using
0397 the dma_pool interface, described below.
0398 
0399 The consistent DMA mapping interfaces, for non-NULL dev, will by
0400 default return a DMA address which is 32-bit addressable.  Even if the
0401 device indicates (via DMA mask) that it may address the upper 32-bits,
0402 consistent allocation will only return > 32-bit addresses for DMA if
0403 the consistent DMA mask has been explicitly changed via
0404 dma_set_coherent_mask().  This is true of the dma_pool interface as
0405 well.
0406 
0407 dma_alloc_coherent() returns two values: the virtual address which you
0408 can use to access it from the CPU and dma_handle which you pass to the
0409 card.
0410 
0411 The CPU virtual address and the DMA address are both
0412 guaranteed to be aligned to the smallest PAGE_SIZE order which
0413 is greater than or equal to the requested size.  This invariant
0414 exists (for example) to guarantee that if you allocate a chunk
0415 which is smaller than or equal to 64 kilobytes, the extent of the
0416 buffer you receive will not cross a 64K boundary.
0417 
0418 To unmap and free such a DMA region, you call:
0419 
0420         dma_free_coherent(dev, size, cpu_addr, dma_handle);
0421 
0422 where dev, size are the same as in the above call and cpu_addr and
0423 dma_handle are the values dma_alloc_coherent() returned to you.
0424 This function may not be called in interrupt context.
0425 
0426 If your driver needs lots of smaller memory regions, you can write
0427 custom code to subdivide pages returned by dma_alloc_coherent(),
0428 or you can use the dma_pool API to do that.  A dma_pool is like
0429 a kmem_cache, but it uses dma_alloc_coherent(), not __get_free_pages().
0430 Also, it understands common hardware constraints for alignment,
0431 like queue heads needing to be aligned on N byte boundaries.
0432 
0433 Create a dma_pool like this:
0434 
0435         struct dma_pool *pool;
0436 
0437         pool = dma_pool_create(name, dev, size, align, boundary);
0438 
0439 The "name" is for diagnostics (like a kmem_cache name); dev and size
0440 are as above.  The device's hardware alignment requirement for this
0441 type of data is "align" (which is expressed in bytes, and must be a
0442 power of two).  If your device has no boundary crossing restrictions,
0443 pass 0 for boundary; passing 4096 says memory allocated from this pool
0444 must not cross 4KByte boundaries (but at that time it may be better to
0445 use dma_alloc_coherent() directly instead).
0446 
0447 Allocate memory from a DMA pool like this:
0448 
0449         cpu_addr = dma_pool_alloc(pool, flags, &dma_handle);
0450 
0451 flags are GFP_KERNEL if blocking is permitted (not in_interrupt nor
0452 holding SMP locks), GFP_ATOMIC otherwise.  Like dma_alloc_coherent(),
0453 this returns two values, cpu_addr and dma_handle.
0454 
0455 Free memory that was allocated from a dma_pool like this:
0456 
0457         dma_pool_free(pool, cpu_addr, dma_handle);
0458 
0459 where pool is what you passed to dma_pool_alloc(), and cpu_addr and
0460 dma_handle are the values dma_pool_alloc() returned. This function
0461 may be called in interrupt context.
0462 
0463 Destroy a dma_pool by calling:
0464 
0465         dma_pool_destroy(pool);
0466 
0467 Make sure you've called dma_pool_free() for all memory allocated
0468 from a pool before you destroy the pool. This function may not
0469 be called in interrupt context.
0470 
0471                         DMA Direction
0472 
0473 The interfaces described in subsequent portions of this document
0474 take a DMA direction argument, which is an integer and takes on
0475 one of the following values:
0476 
0477  DMA_BIDIRECTIONAL
0478  DMA_TO_DEVICE
0479  DMA_FROM_DEVICE
0480  DMA_NONE
0481 
0482 You should provide the exact DMA direction if you know it.
0483 
0484 DMA_TO_DEVICE means "from main memory to the device"
0485 DMA_FROM_DEVICE means "from the device to main memory"
0486 It is the direction in which the data moves during the DMA
0487 transfer.
0488 
0489 You are _strongly_ encouraged to specify this as precisely
0490 as you possibly can.
0491 
0492 If you absolutely cannot know the direction of the DMA transfer,
0493 specify DMA_BIDIRECTIONAL.  It means that the DMA can go in
0494 either direction.  The platform guarantees that you may legally
0495 specify this, and that it will work, but this may be at the
0496 cost of performance for example.
0497 
0498 The value DMA_NONE is to be used for debugging.  One can
0499 hold this in a data structure before you come to know the
0500 precise direction, and this will help catch cases where your
0501 direction tracking logic has failed to set things up properly.
0502 
0503 Another advantage of specifying this value precisely (outside of
0504 potential platform-specific optimizations of such) is for debugging.
0505 Some platforms actually have a write permission boolean which DMA
0506 mappings can be marked with, much like page protections in the user
0507 program address space.  Such platforms can and do report errors in the
0508 kernel logs when the DMA controller hardware detects violation of the
0509 permission setting.
0510 
0511 Only streaming mappings specify a direction, consistent mappings
0512 implicitly have a direction attribute setting of
0513 DMA_BIDIRECTIONAL.
0514 
0515 The SCSI subsystem tells you the direction to use in the
0516 'sc_data_direction' member of the SCSI command your driver is
0517 working on.
0518 
0519 For Networking drivers, it's a rather simple affair.  For transmit
0520 packets, map/unmap them with the DMA_TO_DEVICE direction
0521 specifier.  For receive packets, just the opposite, map/unmap them
0522 with the DMA_FROM_DEVICE direction specifier.
0523 
0524                   Using Streaming DMA mappings
0525 
0526 The streaming DMA mapping routines can be called from interrupt
0527 context.  There are two versions of each map/unmap, one which will
0528 map/unmap a single memory region, and one which will map/unmap a
0529 scatterlist.
0530 
0531 To map a single region, you do:
0532 
0533         struct device *dev = &my_dev->dev;
0534         dma_addr_t dma_handle;
0535         void *addr = buffer->ptr;
0536         size_t size = buffer->len;
0537 
0538         dma_handle = dma_map_single(dev, addr, size, direction);
0539         if (dma_mapping_error(dev, dma_handle)) {
0540                 /*
0541                  * reduce current DMA mapping usage,
0542                  * delay and try again later or
0543                  * reset driver.
0544                  */
0545                 goto map_error_handling;
0546         }
0547 
0548 and to unmap it:
0549 
0550         dma_unmap_single(dev, dma_handle, size, direction);
0551 
0552 You should call dma_mapping_error() as dma_map_single() could fail and return
0553 error. Not all DMA implementations support the dma_mapping_error() interface.
0554 However, it is a good practice to call dma_mapping_error() interface, which
0555 will invoke the generic mapping error check interface. Doing so will ensure
0556 that the mapping code will work correctly on all DMA implementations without
0557 any dependency on the specifics of the underlying implementation. Using the
0558 returned address without checking for errors could result in failures ranging
0559 from panics to silent data corruption. A couple of examples of incorrect ways
0560 to check for errors that make assumptions about the underlying DMA
0561 implementation are as follows and these are applicable to dma_map_page() as
0562 well.
0563 
0564 Incorrect example 1:
0565         dma_addr_t dma_handle;
0566 
0567         dma_handle = dma_map_single(dev, addr, size, direction);
0568         if ((dma_handle & 0xffff != 0) || (dma_handle >= 0x1000000)) {
0569                 goto map_error;
0570         }
0571 
0572 Incorrect example 2:
0573         dma_addr_t dma_handle;
0574 
0575         dma_handle = dma_map_single(dev, addr, size, direction);
0576         if (dma_handle == DMA_ERROR_CODE) {
0577                 goto map_error;
0578         }
0579 
0580 You should call dma_unmap_single() when the DMA activity is finished, e.g.,
0581 from the interrupt which told you that the DMA transfer is done.
0582 
0583 Using CPU pointers like this for single mappings has a disadvantage:
0584 you cannot reference HIGHMEM memory in this way.  Thus, there is a
0585 map/unmap interface pair akin to dma_{map,unmap}_single().  These
0586 interfaces deal with page/offset pairs instead of CPU pointers.
0587 Specifically:
0588 
0589         struct device *dev = &my_dev->dev;
0590         dma_addr_t dma_handle;
0591         struct page *page = buffer->page;
0592         unsigned long offset = buffer->offset;
0593         size_t size = buffer->len;
0594 
0595         dma_handle = dma_map_page(dev, page, offset, size, direction);
0596         if (dma_mapping_error(dev, dma_handle)) {
0597                 /*
0598                  * reduce current DMA mapping usage,
0599                  * delay and try again later or
0600                  * reset driver.
0601                  */
0602                 goto map_error_handling;
0603         }
0604 
0605         ...
0606 
0607         dma_unmap_page(dev, dma_handle, size, direction);
0608 
0609 Here, "offset" means byte offset within the given page.
0610 
0611 You should call dma_mapping_error() as dma_map_page() could fail and return
0612 error as outlined under the dma_map_single() discussion.
0613 
0614 You should call dma_unmap_page() when the DMA activity is finished, e.g.,
0615 from the interrupt which told you that the DMA transfer is done.
0616 
0617 With scatterlists, you map a region gathered from several regions by:
0618 
0619         int i, count = dma_map_sg(dev, sglist, nents, direction);
0620         struct scatterlist *sg;
0621 
0622         for_each_sg(sglist, sg, count, i) {
0623                 hw_address[i] = sg_dma_address(sg);
0624                 hw_len[i] = sg_dma_len(sg);
0625         }
0626 
0627 where nents is the number of entries in the sglist.
0628 
0629 The implementation is free to merge several consecutive sglist entries
0630 into one (e.g. if DMA mapping is done with PAGE_SIZE granularity, any
0631 consecutive sglist entries can be merged into one provided the first one
0632 ends and the second one starts on a page boundary - in fact this is a huge
0633 advantage for cards which either cannot do scatter-gather or have very
0634 limited number of scatter-gather entries) and returns the actual number
0635 of sg entries it mapped them to. On failure 0 is returned.
0636 
0637 Then you should loop count times (note: this can be less than nents times)
0638 and use sg_dma_address() and sg_dma_len() macros where you previously
0639 accessed sg->address and sg->length as shown above.
0640 
0641 To unmap a scatterlist, just call:
0642 
0643         dma_unmap_sg(dev, sglist, nents, direction);
0644 
0645 Again, make sure DMA activity has already finished.
0646 
0647 PLEASE NOTE:  The 'nents' argument to the dma_unmap_sg call must be
0648               the _same_ one you passed into the dma_map_sg call,
0649               it should _NOT_ be the 'count' value _returned_ from the
0650               dma_map_sg call.
0651 
0652 Every dma_map_{single,sg}() call should have its dma_unmap_{single,sg}()
0653 counterpart, because the DMA address space is a shared resource and
0654 you could render the machine unusable by consuming all DMA addresses.
0655 
0656 If you need to use the same streaming DMA region multiple times and touch
0657 the data in between the DMA transfers, the buffer needs to be synced
0658 properly in order for the CPU and device to see the most up-to-date and
0659 correct copy of the DMA buffer.
0660 
0661 So, firstly, just map it with dma_map_{single,sg}(), and after each DMA
0662 transfer call either:
0663 
0664         dma_sync_single_for_cpu(dev, dma_handle, size, direction);
0665 
0666 or:
0667 
0668         dma_sync_sg_for_cpu(dev, sglist, nents, direction);
0669 
0670 as appropriate.
0671 
0672 Then, if you wish to let the device get at the DMA area again,
0673 finish accessing the data with the CPU, and then before actually
0674 giving the buffer to the hardware call either:
0675 
0676         dma_sync_single_for_device(dev, dma_handle, size, direction);
0677 
0678 or:
0679 
0680         dma_sync_sg_for_device(dev, sglist, nents, direction);
0681 
0682 as appropriate.
0683 
0684 PLEASE NOTE:  The 'nents' argument to dma_sync_sg_for_cpu() and
0685               dma_sync_sg_for_device() must be the same passed to
0686               dma_map_sg(). It is _NOT_ the count returned by
0687               dma_map_sg().
0688 
0689 After the last DMA transfer call one of the DMA unmap routines
0690 dma_unmap_{single,sg}(). If you don't touch the data from the first
0691 dma_map_*() call till dma_unmap_*(), then you don't have to call the
0692 dma_sync_*() routines at all.
0693 
0694 Here is pseudo code which shows a situation in which you would need
0695 to use the dma_sync_*() interfaces.
0696 
0697         my_card_setup_receive_buffer(struct my_card *cp, char *buffer, int len)
0698         {
0699                 dma_addr_t mapping;
0700 
0701                 mapping = dma_map_single(cp->dev, buffer, len, DMA_FROM_DEVICE);
0702                 if (dma_mapping_error(cp->dev, mapping)) {
0703                         /*
0704                          * reduce current DMA mapping usage,
0705                          * delay and try again later or
0706                          * reset driver.
0707                          */
0708                         goto map_error_handling;
0709                 }
0710 
0711                 cp->rx_buf = buffer;
0712                 cp->rx_len = len;
0713                 cp->rx_dma = mapping;
0714 
0715                 give_rx_buf_to_card(cp);
0716         }
0717 
0718         ...
0719 
0720         my_card_interrupt_handler(int irq, void *devid, struct pt_regs *regs)
0721         {
0722                 struct my_card *cp = devid;
0723 
0724                 ...
0725                 if (read_card_status(cp) == RX_BUF_TRANSFERRED) {
0726                         struct my_card_header *hp;
0727 
0728                         /* Examine the header to see if we wish
0729                          * to accept the data.  But synchronize
0730                          * the DMA transfer with the CPU first
0731                          * so that we see updated contents.
0732                          */
0733                         dma_sync_single_for_cpu(&cp->dev, cp->rx_dma,
0734                                                 cp->rx_len,
0735                                                 DMA_FROM_DEVICE);
0736 
0737                         /* Now it is safe to examine the buffer. */
0738                         hp = (struct my_card_header *) cp->rx_buf;
0739                         if (header_is_ok(hp)) {
0740                                 dma_unmap_single(&cp->dev, cp->rx_dma, cp->rx_len,
0741                                                  DMA_FROM_DEVICE);
0742                                 pass_to_upper_layers(cp->rx_buf);
0743                                 make_and_setup_new_rx_buf(cp);
0744                         } else {
0745                                 /* CPU should not write to
0746                                  * DMA_FROM_DEVICE-mapped area,
0747                                  * so dma_sync_single_for_device() is
0748                                  * not needed here. It would be required
0749                                  * for DMA_BIDIRECTIONAL mapping if
0750                                  * the memory was modified.
0751                                  */
0752                                 give_rx_buf_to_card(cp);
0753                         }
0754                 }
0755         }
0756 
0757 Drivers converted fully to this interface should not use virt_to_bus() any
0758 longer, nor should they use bus_to_virt(). Some drivers have to be changed a
0759 little bit, because there is no longer an equivalent to bus_to_virt() in the
0760 dynamic DMA mapping scheme - you have to always store the DMA addresses
0761 returned by the dma_alloc_coherent(), dma_pool_alloc(), and dma_map_single()
0762 calls (dma_map_sg() stores them in the scatterlist itself if the platform
0763 supports dynamic DMA mapping in hardware) in your driver structures and/or
0764 in the card registers.
0765 
0766 All drivers should be using these interfaces with no exceptions.  It
0767 is planned to completely remove virt_to_bus() and bus_to_virt() as
0768 they are entirely deprecated.  Some ports already do not provide these
0769 as it is impossible to correctly support them.
0770 
0771                         Handling Errors
0772 
0773 DMA address space is limited on some architectures and an allocation
0774 failure can be determined by:
0775 
0776 - checking if dma_alloc_coherent() returns NULL or dma_map_sg returns 0
0777 
0778 - checking the dma_addr_t returned from dma_map_single() and dma_map_page()
0779   by using dma_mapping_error():
0780 
0781         dma_addr_t dma_handle;
0782 
0783         dma_handle = dma_map_single(dev, addr, size, direction);
0784         if (dma_mapping_error(dev, dma_handle)) {
0785                 /*
0786                  * reduce current DMA mapping usage,
0787                  * delay and try again later or
0788                  * reset driver.
0789                  */
0790                 goto map_error_handling;
0791         }
0792 
0793 - unmap pages that are already mapped, when mapping error occurs in the middle
0794   of a multiple page mapping attempt. These example are applicable to
0795   dma_map_page() as well.
0796 
0797 Example 1:
0798         dma_addr_t dma_handle1;
0799         dma_addr_t dma_handle2;
0800 
0801         dma_handle1 = dma_map_single(dev, addr, size, direction);
0802         if (dma_mapping_error(dev, dma_handle1)) {
0803                 /*
0804                  * reduce current DMA mapping usage,
0805                  * delay and try again later or
0806                  * reset driver.
0807                  */
0808                 goto map_error_handling1;
0809         }
0810         dma_handle2 = dma_map_single(dev, addr, size, direction);
0811         if (dma_mapping_error(dev, dma_handle2)) {
0812                 /*
0813                  * reduce current DMA mapping usage,
0814                  * delay and try again later or
0815                  * reset driver.
0816                  */
0817                 goto map_error_handling2;
0818         }
0819 
0820         ...
0821 
0822         map_error_handling2:
0823                 dma_unmap_single(dma_handle1);
0824         map_error_handling1:
0825 
0826 Example 2: (if buffers are allocated in a loop, unmap all mapped buffers when
0827             mapping error is detected in the middle)
0828 
0829         dma_addr_t dma_addr;
0830         dma_addr_t array[DMA_BUFFERS];
0831         int save_index = 0;
0832 
0833         for (i = 0; i < DMA_BUFFERS; i++) {
0834 
0835                 ...
0836 
0837                 dma_addr = dma_map_single(dev, addr, size, direction);
0838                 if (dma_mapping_error(dev, dma_addr)) {
0839                         /*
0840                          * reduce current DMA mapping usage,
0841                          * delay and try again later or
0842                          * reset driver.
0843                          */
0844                         goto map_error_handling;
0845                 }
0846                 array[i].dma_addr = dma_addr;
0847                 save_index++;
0848         }
0849 
0850         ...
0851 
0852         map_error_handling:
0853 
0854         for (i = 0; i < save_index; i++) {
0855 
0856                 ...
0857 
0858                 dma_unmap_single(array[i].dma_addr);
0859         }
0860 
0861 Networking drivers must call dev_kfree_skb() to free the socket buffer
0862 and return NETDEV_TX_OK if the DMA mapping fails on the transmit hook
0863 (ndo_start_xmit). This means that the socket buffer is just dropped in
0864 the failure case.
0865 
0866 SCSI drivers must return SCSI_MLQUEUE_HOST_BUSY if the DMA mapping
0867 fails in the queuecommand hook. This means that the SCSI subsystem
0868 passes the command to the driver again later.
0869 
0870                 Optimizing Unmap State Space Consumption
0871 
0872 On many platforms, dma_unmap_{single,page}() is simply a nop.
0873 Therefore, keeping track of the mapping address and length is a waste
0874 of space.  Instead of filling your drivers up with ifdefs and the like
0875 to "work around" this (which would defeat the whole purpose of a
0876 portable API) the following facilities are provided.
0877 
0878 Actually, instead of describing the macros one by one, we'll
0879 transform some example code.
0880 
0881 1) Use DEFINE_DMA_UNMAP_{ADDR,LEN} in state saving structures.
0882    Example, before:
0883 
0884         struct ring_state {
0885                 struct sk_buff *skb;
0886                 dma_addr_t mapping;
0887                 __u32 len;
0888         };
0889 
0890    after:
0891 
0892         struct ring_state {
0893                 struct sk_buff *skb;
0894                 DEFINE_DMA_UNMAP_ADDR(mapping);
0895                 DEFINE_DMA_UNMAP_LEN(len);
0896         };
0897 
0898 2) Use dma_unmap_{addr,len}_set() to set these values.
0899    Example, before:
0900 
0901         ringp->mapping = FOO;
0902         ringp->len = BAR;
0903 
0904    after:
0905 
0906         dma_unmap_addr_set(ringp, mapping, FOO);
0907         dma_unmap_len_set(ringp, len, BAR);
0908 
0909 3) Use dma_unmap_{addr,len}() to access these values.
0910    Example, before:
0911 
0912         dma_unmap_single(dev, ringp->mapping, ringp->len,
0913                          DMA_FROM_DEVICE);
0914 
0915    after:
0916 
0917         dma_unmap_single(dev,
0918                          dma_unmap_addr(ringp, mapping),
0919                          dma_unmap_len(ringp, len),
0920                          DMA_FROM_DEVICE);
0921 
0922 It really should be self-explanatory.  We treat the ADDR and LEN
0923 separately, because it is possible for an implementation to only
0924 need the address in order to perform the unmap operation.
0925 
0926                         Platform Issues
0927 
0928 If you are just writing drivers for Linux and do not maintain
0929 an architecture port for the kernel, you can safely skip down
0930 to "Closing".
0931 
0932 1) Struct scatterlist requirements.
0933 
0934    You need to enable CONFIG_NEED_SG_DMA_LENGTH if the architecture
0935    supports IOMMUs (including software IOMMU).
0936 
0937 2) ARCH_DMA_MINALIGN
0938 
0939    Architectures must ensure that kmalloc'ed buffer is
0940    DMA-safe. Drivers and subsystems depend on it. If an architecture
0941    isn't fully DMA-coherent (i.e. hardware doesn't ensure that data in
0942    the CPU cache is identical to data in main memory),
0943    ARCH_DMA_MINALIGN must be set so that the memory allocator
0944    makes sure that kmalloc'ed buffer doesn't share a cache line with
0945    the others. See arch/arm/include/asm/cache.h as an example.
0946 
0947    Note that ARCH_DMA_MINALIGN is about DMA memory alignment
0948    constraints. You don't need to worry about the architecture data
0949    alignment constraints (e.g. the alignment constraints about 64-bit
0950    objects).
0951 
0952                            Closing
0953 
0954 This document, and the API itself, would not be in its current
0955 form without the feedback and suggestions from numerous individuals.
0956 We would like to specifically mention, in no particular order, the
0957 following people:
0958 
0959         Russell King <rmk@arm.linux.org.uk>
0960         Leo Dagum <dagum@barrel.engr.sgi.com>
0961         Ralf Baechle <ralf@oss.sgi.com>
0962         Grant Grundler <grundler@cup.hp.com>
0963         Jay Estabrook <Jay.Estabrook@compaq.com>
0964         Thomas Sailer <sailer@ife.ee.ethz.ch>
0965         Andrea Arcangeli <andrea@suse.de>
0966         Jens Axboe <jens.axboe@oracle.com>
0967         David Mosberger-Tang <davidm@hpl.hp.com>