Back to home page

OSCL-LXR

 
 

    


0001 .. _balance:
0002 
0003 ================
0004 Memory Balancing
0005 ================
0006 
0007 Started Jan 2000 by Kanoj Sarcar <kanoj@sgi.com>
0008 
0009 Memory balancing is needed for !__GFP_ATOMIC and !__GFP_KSWAPD_RECLAIM as
0010 well as for non __GFP_IO allocations.
0011 
0012 The first reason why a caller may avoid reclaim is that the caller can not
0013 sleep due to holding a spinlock or is in interrupt context. The second may
0014 be that the caller is willing to fail the allocation without incurring the
0015 overhead of page reclaim. This may happen for opportunistic high-order
0016 allocation requests that have order-0 fallback options. In such cases,
0017 the caller may also wish to avoid waking kswapd.
0018 
0019 __GFP_IO allocation requests are made to prevent file system deadlocks.
0020 
0021 In the absence of non sleepable allocation requests, it seems detrimental
0022 to be doing balancing. Page reclamation can be kicked off lazily, that
0023 is, only when needed (aka zone free memory is 0), instead of making it
0024 a proactive process.
0025 
0026 That being said, the kernel should try to fulfill requests for direct
0027 mapped pages from the direct mapped pool, instead of falling back on
0028 the dma pool, so as to keep the dma pool filled for dma requests (atomic
0029 or not). A similar argument applies to highmem and direct mapped pages.
0030 OTOH, if there is a lot of free dma pages, it is preferable to satisfy
0031 regular memory requests by allocating one from the dma pool, instead
0032 of incurring the overhead of regular zone balancing.
0033 
0034 In 2.2, memory balancing/page reclamation would kick off only when the
0035 _total_ number of free pages fell below 1/64 th of total memory. With the
0036 right ratio of dma and regular memory, it is quite possible that balancing
0037 would not be done even when the dma zone was completely empty. 2.2 has
0038 been running production machines of varying memory sizes, and seems to be
0039 doing fine even with the presence of this problem. In 2.3, due to
0040 HIGHMEM, this problem is aggravated.
0041 
0042 In 2.3, zone balancing can be done in one of two ways: depending on the
0043 zone size (and possibly of the size of lower class zones), we can decide
0044 at init time how many free pages we should aim for while balancing any
0045 zone. The good part is, while balancing, we do not need to look at sizes
0046 of lower class zones, the bad part is, we might do too frequent balancing
0047 due to ignoring possibly lower usage in the lower class zones. Also,
0048 with a slight change in the allocation routine, it is possible to reduce
0049 the memclass() macro to be a simple equality.
0050 
0051 Another possible solution is that we balance only when the free memory
0052 of a zone _and_ all its lower class zones falls below 1/64th of the
0053 total memory in the zone and its lower class zones. This fixes the 2.2
0054 balancing problem, and stays as close to 2.2 behavior as possible. Also,
0055 the balancing algorithm works the same way on the various architectures,
0056 which have different numbers and types of zones. If we wanted to get
0057 fancy, we could assign different weights to free pages in different
0058 zones in the future.
0059 
0060 Note that if the size of the regular zone is huge compared to dma zone,
0061 it becomes less significant to consider the free dma pages while
0062 deciding whether to balance the regular zone. The first solution
0063 becomes more attractive then.
0064 
0065 The appended patch implements the second solution. It also "fixes" two
0066 problems: first, kswapd is woken up as in 2.2 on low memory conditions
0067 for non-sleepable allocations. Second, the HIGHMEM zone is also balanced,
0068 so as to give a fighting chance for replace_with_highmem() to get a
0069 HIGHMEM page, as well as to ensure that HIGHMEM allocations do not
0070 fall back into regular zone. This also makes sure that HIGHMEM pages
0071 are not leaked (for example, in situations where a HIGHMEM page is in
0072 the swapcache but is not being used by anyone)
0073 
0074 kswapd also needs to know about the zones it should balance. kswapd is
0075 primarily needed in a situation where balancing can not be done,
0076 probably because all allocation requests are coming from intr context
0077 and all process contexts are sleeping. For 2.3, kswapd does not really
0078 need to balance the highmem zone, since intr context does not request
0079 highmem pages. kswapd looks at the zone_wake_kswapd field in the zone
0080 structure to decide whether a zone needs balancing.
0081 
0082 Page stealing from process memory and shm is done if stealing the page would
0083 alleviate memory pressure on any zone in the page's node that has fallen below
0084 its watermark.
0085 
0086 watemark[WMARK_MIN/WMARK_LOW/WMARK_HIGH]/low_on_memory/zone_wake_kswapd: These
0087 are per-zone fields, used to determine when a zone needs to be balanced. When
0088 the number of pages falls below watermark[WMARK_MIN], the hysteric field
0089 low_on_memory gets set. This stays set till the number of free pages becomes
0090 watermark[WMARK_HIGH]. When low_on_memory is set, page allocation requests will
0091 try to free some pages in the zone (providing GFP_WAIT is set in the request).
0092 Orthogonal to this, is the decision to poke kswapd to free some zone pages.
0093 That decision is not hysteresis based, and is done when the number of free
0094 pages is below watermark[WMARK_LOW]; in which case zone_wake_kswapd is also set.
0095 
0096 
0097 (Good) Ideas that I have heard:
0098 
0099 1. Dynamic experience should influence balancing: number of failed requests
0100    for a zone can be tracked and fed into the balancing scheme (jalvo@mbay.net)
0101 2. Implement a replace_with_highmem()-like replace_with_regular() to preserve
0102    dma pages. (lkd@tantalophile.demon.co.uk)