0001 ========================
0002 MMC Asynchronous Request
0003 ========================
0004
0005 Rationale
0006 =========
0007
0008 How significant is the cache maintenance overhead?
0009
0010 It depends. Fast eMMC and multiple cache levels with speculative cache
0011 pre-fetch makes the cache overhead relatively significant. If the DMA
0012 preparations for the next request are done in parallel with the current
0013 transfer, the DMA preparation overhead would not affect the MMC performance.
0014
0015 The intention of non-blocking (asynchronous) MMC requests is to minimize the
0016 time between when an MMC request ends and another MMC request begins.
0017
0018 Using mmc_wait_for_req(), the MMC controller is idle while dma_map_sg and
0019 dma_unmap_sg are processing. Using non-blocking MMC requests makes it
0020 possible to prepare the caches for next job in parallel with an active
0021 MMC request.
0022
0023 MMC block driver
0024 ================
0025
0026 The mmc_blk_issue_rw_rq() in the MMC block driver is made non-blocking.
0027
0028 The increase in throughput is proportional to the time it takes to
0029 prepare (major part of preparations are dma_map_sg() and dma_unmap_sg())
0030 a request and how fast the memory is. The faster the MMC/SD is the
0031 more significant the prepare request time becomes. Roughly the expected
0032 performance gain is 5% for large writes and 10% on large reads on a L2 cache
0033 platform. In power save mode, when clocks run on a lower frequency, the DMA
0034 preparation may cost even more. As long as these slower preparations are run
0035 in parallel with the transfer performance won't be affected.
0036
0037 Details on measurements from IOZone and mmc_test
0038 ================================================
0039
0040 https://wiki.linaro.org/WorkingGroups/Kernel/Specs/StoragePerfMMC-async-req
0041
0042 MMC core API extension
0043 ======================
0044
0045 There is one new public function mmc_start_req().
0046
0047 It starts a new MMC command request for a host. The function isn't
0048 truly non-blocking. If there is an ongoing async request it waits
0049 for completion of that request and starts the new one and returns. It
0050 doesn't wait for the new request to complete. If there is no ongoing
0051 request it starts the new request and returns immediately.
0052
0053 MMC host extensions
0054 ===================
0055
0056 There are two optional members in the mmc_host_ops -- pre_req() and
0057 post_req() -- that the host driver may implement in order to move work
0058 to before and after the actual mmc_host_ops.request() function is called.
0059
0060 In the DMA case pre_req() may do dma_map_sg() and prepare the DMA
0061 descriptor, and post_req() runs the dma_unmap_sg().
0062
0063 Optimize for the first request
0064 ==============================
0065
0066 The first request in a series of requests can't be prepared in parallel
0067 with the previous transfer, since there is no previous request.
0068
0069 The argument is_first_req in pre_req() indicates that there is no previous
0070 request. The host driver may optimize for this scenario to minimize
0071 the performance loss. A way to optimize for this is to split the current
0072 request in two chunks, prepare the first chunk and start the request,
0073 and finally prepare the second chunk and start the transfer.
0074
0075 Pseudocode to handle is_first_req scenario with minimal prepare overhead::
0076
0077 if (is_first_req && req->size > threshold)
0078 /* start MMC transfer for the complete transfer size */
0079 mmc_start_command(MMC_CMD_TRANSFER_FULL_SIZE);
0080
0081 /*
0082 * Begin to prepare DMA while cmd is being processed by MMC.
0083 * The first chunk of the request should take the same time
0084 * to prepare as the "MMC process command time".
0085 * If prepare time exceeds MMC cmd time
0086 * the transfer is delayed, guesstimate max 4k as first chunk size.
0087 */
0088 prepare_1st_chunk_for_dma(req);
0089 /* flush pending desc to the DMAC (dmaengine.h) */
0090 dma_issue_pending(req->dma_desc);
0091
0092 prepare_2nd_chunk_for_dma(req);
0093 /*
0094 * The second issue_pending should be called before MMC runs out
0095 * of the first chunk. If the MMC runs out of the first data chunk
0096 * before this call, the transfer is delayed.
0097 */
0098 dma_issue_pending(req->dma_desc);