Back to home page

OSCL-LXR

 
 

    


0001 ==================================
0002 Memory Attribute Aliasing on IA-64
0003 ==================================
0004 
0005 Bjorn Helgaas <bjorn.helgaas@hp.com>
0006 
0007 May 4, 2006
0008 
0009 
0010 Memory Attributes
0011 =================
0012 
0013     Itanium supports several attributes for virtual memory references.
0014     The attribute is part of the virtual translation, i.e., it is
0015     contained in the TLB entry.  The ones of most interest to the Linux
0016     kernel are:
0017 
0018         ==              ======================
0019         WB              Write-back (cacheable)
0020         UC              Uncacheable
0021         WC              Write-coalescing
0022         ==              ======================
0023 
0024     System memory typically uses the WB attribute.  The UC attribute is
0025     used for memory-mapped I/O devices.  The WC attribute is uncacheable
0026     like UC is, but writes may be delayed and combined to increase
0027     performance for things like frame buffers.
0028 
0029     The Itanium architecture requires that we avoid accessing the same
0030     page with both a cacheable mapping and an uncacheable mapping[1].
0031 
0032     The design of the chipset determines which attributes are supported
0033     on which regions of the address space.  For example, some chipsets
0034     support either WB or UC access to main memory, while others support
0035     only WB access.
0036 
0037 Memory Map
0038 ==========
0039 
0040     Platform firmware describes the physical memory map and the
0041     supported attributes for each region.  At boot-time, the kernel uses
0042     the EFI GetMemoryMap() interface.  ACPI can also describe memory
0043     devices and the attributes they support, but Linux/ia64 currently
0044     doesn't use this information.
0045 
0046     The kernel uses the efi_memmap table returned from GetMemoryMap() to
0047     learn the attributes supported by each region of physical address
0048     space.  Unfortunately, this table does not completely describe the
0049     address space because some machines omit some or all of the MMIO
0050     regions from the map.
0051 
0052     The kernel maintains another table, kern_memmap, which describes the
0053     memory Linux is actually using and the attribute for each region.
0054     This contains only system memory; it does not contain MMIO space.
0055 
0056     The kern_memmap table typically contains only a subset of the system
0057     memory described by the efi_memmap.  Linux/ia64 can't use all memory
0058     in the system because of constraints imposed by the identity mapping
0059     scheme.
0060 
0061     The efi_memmap table is preserved unmodified because the original
0062     boot-time information is required for kexec.
0063 
0064 Kernel Identify Mappings
0065 ========================
0066 
0067     Linux/ia64 identity mappings are done with large pages, currently
0068     either 16MB or 64MB, referred to as "granules."  Cacheable mappings
0069     are speculative[2], so the processor can read any location in the
0070     page at any time, independent of the programmer's intentions.  This
0071     means that to avoid attribute aliasing, Linux can create a cacheable
0072     identity mapping only when the entire granule supports cacheable
0073     access.
0074 
0075     Therefore, kern_memmap contains only full granule-sized regions that
0076     can referenced safely by an identity mapping.
0077 
0078     Uncacheable mappings are not speculative, so the processor will
0079     generate UC accesses only to locations explicitly referenced by
0080     software.  This allows UC identity mappings to cover granules that
0081     are only partially populated, or populated with a combination of UC
0082     and WB regions.
0083 
0084 User Mappings
0085 =============
0086 
0087     User mappings are typically done with 16K or 64K pages.  The smaller
0088     page size allows more flexibility because only 16K or 64K has to be
0089     homogeneous with respect to memory attributes.
0090 
0091 Potential Attribute Aliasing Cases
0092 ==================================
0093 
0094     There are several ways the kernel creates new mappings:
0095 
0096 mmap of /dev/mem
0097 ----------------
0098 
0099         This uses remap_pfn_range(), which creates user mappings.  These
0100         mappings may be either WB or UC.  If the region being mapped
0101         happens to be in kern_memmap, meaning that it may also be mapped
0102         by a kernel identity mapping, the user mapping must use the same
0103         attribute as the kernel mapping.
0104 
0105         If the region is not in kern_memmap, the user mapping should use
0106         an attribute reported as being supported in the EFI memory map.
0107 
0108         Since the EFI memory map does not describe MMIO on some
0109         machines, this should use an uncacheable mapping as a fallback.
0110 
0111 mmap of /sys/class/pci_bus/.../legacy_mem
0112 -----------------------------------------
0113 
0114         This is very similar to mmap of /dev/mem, except that legacy_mem
0115         only allows mmap of the one megabyte "legacy MMIO" area for a
0116         specific PCI bus.  Typically this is the first megabyte of
0117         physical address space, but it may be different on machines with
0118         several VGA devices.
0119 
0120         "X" uses this to access VGA frame buffers.  Using legacy_mem
0121         rather than /dev/mem allows multiple instances of X to talk to
0122         different VGA cards.
0123 
0124         The /dev/mem mmap constraints apply.
0125 
0126 mmap of /proc/bus/pci/.../??.?
0127 ------------------------------
0128 
0129         This is an MMIO mmap of PCI functions, which additionally may or
0130         may not be requested as using the WC attribute.
0131 
0132         If WC is requested, and the region in kern_memmap is either WC
0133         or UC, and the EFI memory map designates the region as WC, then
0134         the WC mapping is allowed.
0135 
0136         Otherwise, the user mapping must use the same attribute as the
0137         kernel mapping.
0138 
0139 read/write of /dev/mem
0140 ----------------------
0141 
0142         This uses copy_from_user(), which implicitly uses a kernel
0143         identity mapping.  This is obviously safe for things in
0144         kern_memmap.
0145 
0146         There may be corner cases of things that are not in kern_memmap,
0147         but could be accessed this way.  For example, registers in MMIO
0148         space are not in kern_memmap, but could be accessed with a UC
0149         mapping.  This would not cause attribute aliasing.  But
0150         registers typically can be accessed only with four-byte or
0151         eight-byte accesses, and the copy_from_user() path doesn't allow
0152         any control over the access size, so this would be dangerous.
0153 
0154 ioremap()
0155 ---------
0156 
0157         This returns a mapping for use inside the kernel.
0158 
0159         If the region is in kern_memmap, we should use the attribute
0160         specified there.
0161 
0162         If the EFI memory map reports that the entire granule supports
0163         WB, we should use that (granules that are partially reserved
0164         or occupied by firmware do not appear in kern_memmap).
0165 
0166         If the granule contains non-WB memory, but we can cover the
0167         region safely with kernel page table mappings, we can use
0168         ioremap_page_range() as most other architectures do.
0169 
0170         Failing all of the above, we have to fall back to a UC mapping.
0171 
0172 Past Problem Cases
0173 ==================
0174 
0175 mmap of various MMIO regions from /dev/mem by "X" on Intel platforms
0176 --------------------------------------------------------------------
0177 
0178       The EFI memory map may not report these MMIO regions.
0179 
0180       These must be allowed so that X will work.  This means that
0181       when the EFI memory map is incomplete, every /dev/mem mmap must
0182       succeed.  It may create either WB or UC user mappings, depending
0183       on whether the region is in kern_memmap or the EFI memory map.
0184 
0185 mmap of 0x0-0x9FFFF /dev/mem by "hwinfo" on HP sx1000 with VGA enabled
0186 ----------------------------------------------------------------------
0187 
0188       The EFI memory map reports the following attributes:
0189 
0190         =============== ======= ==================
0191         0x00000-0x9FFFF WB only
0192         0xA0000-0xBFFFF UC only (VGA frame buffer)
0193         0xC0000-0xFFFFF WB only
0194         =============== ======= ==================
0195 
0196       This mmap is done with user pages, not kernel identity mappings,
0197       so it is safe to use WB mappings.
0198 
0199       The kernel VGA driver may ioremap the VGA frame buffer at 0xA0000,
0200       which uses a granule-sized UC mapping.  This granule will cover some
0201       WB-only memory, but since UC is non-speculative, the processor will
0202       never generate an uncacheable reference to the WB-only areas unless
0203       the driver explicitly touches them.
0204 
0205 mmap of 0x0-0xFFFFF legacy_mem by "X"
0206 -------------------------------------
0207 
0208       If the EFI memory map reports that the entire range supports the
0209       same attributes, we can allow the mmap (and we will prefer WB if
0210       supported, as is the case with HP sx[12]000 machines with VGA
0211       disabled).
0212 
0213       If EFI reports the range as partly WB and partly UC (as on sx[12]000
0214       machines with VGA enabled), we must fail the mmap because there's no
0215       safe attribute to use.
0216 
0217       If EFI reports some of the range but not all (as on Intel firmware
0218       that doesn't report the VGA frame buffer at all), we should fail the
0219       mmap and force the user to map just the specific region of interest.
0220 
0221 mmap of 0xA0000-0xBFFFF legacy_mem by "X" on HP sx1000 with VGA disabled
0222 ------------------------------------------------------------------------
0223 
0224       The EFI memory map reports the following attributes::
0225 
0226         0x00000-0xFFFFF WB only (no VGA MMIO hole)
0227 
0228       This is a special case of the previous case, and the mmap should
0229       fail for the same reason as above.
0230 
0231 read of /sys/devices/.../rom
0232 ----------------------------
0233 
0234       For VGA devices, this may cause an ioremap() of 0xC0000.  This
0235       used to be done with a UC mapping, because the VGA frame buffer
0236       at 0xA0000 prevents use of a WB granule.  The UC mapping causes
0237       an MCA on HP sx[12]000 chipsets.
0238 
0239       We should use WB page table mappings to avoid covering the VGA
0240       frame buffer.
0241 
0242 Notes
0243 =====
0244 
0245     [1] SDM rev 2.2, vol 2, sec 4.4.1.
0246     [2] SDM rev 2.2, vol 2, sec 4.4.6.