Back to home page

OSCL-LXR

 
 

    


0001 .. SPDX-License-Identifier: GPL-2.0
0002 
0003 ========================
0004 Linux and the Devicetree
0005 ========================
0006 
0007 The Linux usage model for device tree data
0008 
0009 :Author: Grant Likely <grant.likely@secretlab.ca>
0010 
0011 This article describes how Linux uses the device tree.  An overview of
0012 the device tree data format can be found on the device tree usage page
0013 at devicetree.org\ [1]_.
0014 
0015 .. [1] https://www.devicetree.org/specifications/
0016 
0017 The "Open Firmware Device Tree", or simply Devicetree (DT), is a data
0018 structure and language for describing hardware.  More specifically, it
0019 is a description of hardware that is readable by an operating system
0020 so that the operating system doesn't need to hard code details of the
0021 machine.
0022 
0023 Structurally, the DT is a tree, or acyclic graph with named nodes, and
0024 nodes may have an arbitrary number of named properties encapsulating
0025 arbitrary data.  A mechanism also exists to create arbitrary
0026 links from one node to another outside of the natural tree structure.
0027 
0028 Conceptually, a common set of usage conventions, called 'bindings',
0029 is defined for how data should appear in the tree to describe typical
0030 hardware characteristics including data busses, interrupt lines, GPIO
0031 connections, and peripheral devices.
0032 
0033 As much as possible, hardware is described using existing bindings to
0034 maximize use of existing support code, but since property and node
0035 names are simply text strings, it is easy to extend existing bindings
0036 or create new ones by defining new nodes and properties.  Be wary,
0037 however, of creating a new binding without first doing some homework
0038 about what already exists.  There are currently two different,
0039 incompatible, bindings for i2c busses that came about because the new
0040 binding was created without first investigating how i2c devices were
0041 already being enumerated in existing systems.
0042 
0043 1. History
0044 ----------
0045 The DT was originally created by Open Firmware as part of the
0046 communication method for passing data from Open Firmware to a client
0047 program (like to an operating system).  An operating system used the
0048 Device Tree to discover the topology of the hardware at runtime, and
0049 thereby support a majority of available hardware without hard coded
0050 information (assuming drivers were available for all devices).
0051 
0052 Since Open Firmware is commonly used on PowerPC and SPARC platforms,
0053 the Linux support for those architectures has for a long time used the
0054 Device Tree.
0055 
0056 In 2005, when PowerPC Linux began a major cleanup and to merge 32-bit
0057 and 64-bit support, the decision was made to require DT support on all
0058 powerpc platforms, regardless of whether or not they used Open
0059 Firmware.  To do this, a DT representation called the Flattened Device
0060 Tree (FDT) was created which could be passed to the kernel as a binary
0061 blob without requiring a real Open Firmware implementation.  U-Boot,
0062 kexec, and other bootloaders were modified to support both passing a
0063 Device Tree Binary (dtb) and to modify a dtb at boot time.  DT was
0064 also added to the PowerPC boot wrapper (``arch/powerpc/boot/*``) so that
0065 a dtb could be wrapped up with the kernel image to support booting
0066 existing non-DT aware firmware.
0067 
0068 Some time later, FDT infrastructure was generalized to be usable by
0069 all architectures.  At the time of this writing, 6 mainlined
0070 architectures (arm, microblaze, mips, powerpc, sparc, and x86) and 1
0071 out of mainline (nios) have some level of DT support.
0072 
0073 2. Data Model
0074 -------------
0075 If you haven't already read the Device Tree Usage\ [1]_ page,
0076 then go read it now.  It's okay, I'll wait....
0077 
0078 2.1 High Level View
0079 -------------------
0080 The most important thing to understand is that the DT is simply a data
0081 structure that describes the hardware.  There is nothing magical about
0082 it, and it doesn't magically make all hardware configuration problems
0083 go away.  What it does do is provide a language for decoupling the
0084 hardware configuration from the board and device driver support in the
0085 Linux kernel (or any other operating system for that matter).  Using
0086 it allows board and device support to become data driven; to make
0087 setup decisions based on data passed into the kernel instead of on
0088 per-machine hard coded selections.
0089 
0090 Ideally, data driven platform setup should result in less code
0091 duplication and make it easier to support a wide range of hardware
0092 with a single kernel image.
0093 
0094 Linux uses DT data for three major purposes:
0095 
0096 1) platform identification,
0097 2) runtime configuration, and
0098 3) device population.
0099 
0100 2.2 Platform Identification
0101 ---------------------------
0102 First and foremost, the kernel will use data in the DT to identify the
0103 specific machine.  In a perfect world, the specific platform shouldn't
0104 matter to the kernel because all platform details would be described
0105 perfectly by the device tree in a consistent and reliable manner.
0106 Hardware is not perfect though, and so the kernel must identify the
0107 machine during early boot so that it has the opportunity to run
0108 machine-specific fixups.
0109 
0110 In the majority of cases, the machine identity is irrelevant, and the
0111 kernel will instead select setup code based on the machine's core
0112 CPU or SoC.  On ARM for example, setup_arch() in
0113 arch/arm/kernel/setup.c will call setup_machine_fdt() in
0114 arch/arm/kernel/devtree.c which searches through the machine_desc
0115 table and selects the machine_desc which best matches the device tree
0116 data.  It determines the best match by looking at the 'compatible'
0117 property in the root device tree node, and comparing it with the
0118 dt_compat list in struct machine_desc (which is defined in
0119 arch/arm/include/asm/mach/arch.h if you're curious).
0120 
0121 The 'compatible' property contains a sorted list of strings starting
0122 with the exact name of the machine, followed by an optional list of
0123 boards it is compatible with sorted from most compatible to least.  For
0124 example, the root compatible properties for the TI BeagleBoard and its
0125 successor, the BeagleBoard xM board might look like, respectively::
0126 
0127         compatible = "ti,omap3-beagleboard", "ti,omap3450", "ti,omap3";
0128         compatible = "ti,omap3-beagleboard-xm", "ti,omap3450", "ti,omap3";
0129 
0130 Where "ti,omap3-beagleboard-xm" specifies the exact model, it also
0131 claims that it compatible with the OMAP 3450 SoC, and the omap3 family
0132 of SoCs in general.  You'll notice that the list is sorted from most
0133 specific (exact board) to least specific (SoC family).
0134 
0135 Astute readers might point out that the Beagle xM could also claim
0136 compatibility with the original Beagle board.  However, one should be
0137 cautioned about doing so at the board level since there is typically a
0138 high level of change from one board to another, even within the same
0139 product line, and it is hard to nail down exactly what is meant when one
0140 board claims to be compatible with another.  For the top level, it is
0141 better to err on the side of caution and not claim one board is
0142 compatible with another.  The notable exception would be when one
0143 board is a carrier for another, such as a CPU module attached to a
0144 carrier board.
0145 
0146 One more note on compatible values.  Any string used in a compatible
0147 property must be documented as to what it indicates.  Add
0148 documentation for compatible strings in Documentation/devicetree/bindings.
0149 
0150 Again on ARM, for each machine_desc, the kernel looks to see if
0151 any of the dt_compat list entries appear in the compatible property.
0152 If one does, then that machine_desc is a candidate for driving the
0153 machine.  After searching the entire table of machine_descs,
0154 setup_machine_fdt() returns the 'most compatible' machine_desc based
0155 on which entry in the compatible property each machine_desc matches
0156 against.  If no matching machine_desc is found, then it returns NULL.
0157 
0158 The reasoning behind this scheme is the observation that in the majority
0159 of cases, a single machine_desc can support a large number of boards
0160 if they all use the same SoC, or same family of SoCs.  However,
0161 invariably there will be some exceptions where a specific board will
0162 require special setup code that is not useful in the generic case.
0163 Special cases could be handled by explicitly checking for the
0164 troublesome board(s) in generic setup code, but doing so very quickly
0165 becomes ugly and/or unmaintainable if it is more than just a couple of
0166 cases.
0167 
0168 Instead, the compatible list allows a generic machine_desc to provide
0169 support for a wide common set of boards by specifying "less
0170 compatible" values in the dt_compat list.  In the example above,
0171 generic board support can claim compatibility with "ti,omap3" or
0172 "ti,omap3450".  If a bug was discovered on the original beagleboard
0173 that required special workaround code during early boot, then a new
0174 machine_desc could be added which implements the workarounds and only
0175 matches on "ti,omap3-beagleboard".
0176 
0177 PowerPC uses a slightly different scheme where it calls the .probe()
0178 hook from each machine_desc, and the first one returning TRUE is used.
0179 However, this approach does not take into account the priority of the
0180 compatible list, and probably should be avoided for new architecture
0181 support.
0182 
0183 2.3 Runtime configuration
0184 -------------------------
0185 In most cases, a DT will be the sole method of communicating data from
0186 firmware to the kernel, so also gets used to pass in runtime and
0187 configuration data like the kernel parameters string and the location
0188 of an initrd image.
0189 
0190 Most of this data is contained in the /chosen node, and when booting
0191 Linux it will look something like this::
0192 
0193         chosen {
0194                 bootargs = "console=ttyS0,115200 loglevel=8";
0195                 initrd-start = <0xc8000000>;
0196                 initrd-end = <0xc8200000>;
0197         };
0198 
0199 The bootargs property contains the kernel arguments, and the initrd-*
0200 properties define the address and size of an initrd blob.  Note that
0201 initrd-end is the first address after the initrd image, so this doesn't
0202 match the usual semantic of struct resource.  The chosen node may also
0203 optionally contain an arbitrary number of additional properties for
0204 platform-specific configuration data.
0205 
0206 During early boot, the architecture setup code calls of_scan_flat_dt()
0207 several times with different helper callbacks to parse device tree
0208 data before paging is setup.  The of_scan_flat_dt() code scans through
0209 the device tree and uses the helpers to extract information required
0210 during early boot.  Typically the early_init_dt_scan_chosen() helper
0211 is used to parse the chosen node including kernel parameters,
0212 early_init_dt_scan_root() to initialize the DT address space model,
0213 and early_init_dt_scan_memory() to determine the size and
0214 location of usable RAM.
0215 
0216 On ARM, the function setup_machine_fdt() is responsible for early
0217 scanning of the device tree after selecting the correct machine_desc
0218 that supports the board.
0219 
0220 2.4 Device population
0221 ---------------------
0222 After the board has been identified, and after the early configuration data
0223 has been parsed, then kernel initialization can proceed in the normal
0224 way.  At some point in this process, unflatten_device_tree() is called
0225 to convert the data into a more efficient runtime representation.
0226 This is also when machine-specific setup hooks will get called, like
0227 the machine_desc .init_early(), .init_irq() and .init_machine() hooks
0228 on ARM.  The remainder of this section uses examples from the ARM
0229 implementation, but all architectures will do pretty much the same
0230 thing when using a DT.
0231 
0232 As can be guessed by the names, .init_early() is used for any machine-
0233 specific setup that needs to be executed early in the boot process,
0234 and .init_irq() is used to set up interrupt handling.  Using a DT
0235 doesn't materially change the behaviour of either of these functions.
0236 If a DT is provided, then both .init_early() and .init_irq() are able
0237 to call any of the DT query functions (of_* in include/linux/of*.h) to
0238 get additional data about the platform.
0239 
0240 The most interesting hook in the DT context is .init_machine() which
0241 is primarily responsible for populating the Linux device model with
0242 data about the platform.  Historically this has been implemented on
0243 embedded platforms by defining a set of static clock structures,
0244 platform_devices, and other data in the board support .c file, and
0245 registering it en-masse in .init_machine().  When DT is used, then
0246 instead of hard coding static devices for each platform, the list of
0247 devices can be obtained by parsing the DT, and allocating device
0248 structures dynamically.
0249 
0250 The simplest case is when .init_machine() is only responsible for
0251 registering a block of platform_devices.  A platform_device is a concept
0252 used by Linux for memory or I/O mapped devices which cannot be detected
0253 by hardware, and for 'composite' or 'virtual' devices (more on those
0254 later).  While there is no 'platform device' terminology for the DT,
0255 platform devices roughly correspond to device nodes at the root of the
0256 tree and children of simple memory mapped bus nodes.
0257 
0258 About now is a good time to lay out an example.  Here is part of the
0259 device tree for the NVIDIA Tegra board::
0260 
0261   /{
0262         compatible = "nvidia,harmony", "nvidia,tegra20";
0263         #address-cells = <1>;
0264         #size-cells = <1>;
0265         interrupt-parent = <&intc>;
0266 
0267         chosen { };
0268         aliases { };
0269 
0270         memory {
0271                 device_type = "memory";
0272                 reg = <0x00000000 0x40000000>;
0273         };
0274 
0275         soc {
0276                 compatible = "nvidia,tegra20-soc", "simple-bus";
0277                 #address-cells = <1>;
0278                 #size-cells = <1>;
0279                 ranges;
0280 
0281                 intc: interrupt-controller@50041000 {
0282                         compatible = "nvidia,tegra20-gic";
0283                         interrupt-controller;
0284                         #interrupt-cells = <1>;
0285                         reg = <0x50041000 0x1000>, < 0x50040100 0x0100 >;
0286                 };
0287 
0288                 serial@70006300 {
0289                         compatible = "nvidia,tegra20-uart";
0290                         reg = <0x70006300 0x100>;
0291                         interrupts = <122>;
0292                 };
0293 
0294                 i2s1: i2s@70002800 {
0295                         compatible = "nvidia,tegra20-i2s";
0296                         reg = <0x70002800 0x100>;
0297                         interrupts = <77>;
0298                         codec = <&wm8903>;
0299                 };
0300 
0301                 i2c@7000c000 {
0302                         compatible = "nvidia,tegra20-i2c";
0303                         #address-cells = <1>;
0304                         #size-cells = <0>;
0305                         reg = <0x7000c000 0x100>;
0306                         interrupts = <70>;
0307 
0308                         wm8903: codec@1a {
0309                                 compatible = "wlf,wm8903";
0310                                 reg = <0x1a>;
0311                                 interrupts = <347>;
0312                         };
0313                 };
0314         };
0315 
0316         sound {
0317                 compatible = "nvidia,harmony-sound";
0318                 i2s-controller = <&i2s1>;
0319                 i2s-codec = <&wm8903>;
0320         };
0321   };
0322 
0323 At .init_machine() time, Tegra board support code will need to look at
0324 this DT and decide which nodes to create platform_devices for.
0325 However, looking at the tree, it is not immediately obvious what kind
0326 of device each node represents, or even if a node represents a device
0327 at all.  The /chosen, /aliases, and /memory nodes are informational
0328 nodes that don't describe devices (although arguably memory could be
0329 considered a device).  The children of the /soc node are memory mapped
0330 devices, but the codec@1a is an i2c device, and the sound node
0331 represents not a device, but rather how other devices are connected
0332 together to create the audio subsystem.  I know what each device is
0333 because I'm familiar with the board design, but how does the kernel
0334 know what to do with each node?
0335 
0336 The trick is that the kernel starts at the root of the tree and looks
0337 for nodes that have a 'compatible' property.  First, it is generally
0338 assumed that any node with a 'compatible' property represents a device
0339 of some kind, and second, it can be assumed that any node at the root
0340 of the tree is either directly attached to the processor bus, or is a
0341 miscellaneous system device that cannot be described any other way.
0342 For each of these nodes, Linux allocates and registers a
0343 platform_device, which in turn may get bound to a platform_driver.
0344 
0345 Why is using a platform_device for these nodes a safe assumption?
0346 Well, for the way that Linux models devices, just about all bus_types
0347 assume that its devices are children of a bus controller.  For
0348 example, each i2c_client is a child of an i2c_master.  Each spi_device
0349 is a child of an SPI bus.  Similarly for USB, PCI, MDIO, etc.  The
0350 same hierarchy is also found in the DT, where I2C device nodes only
0351 ever appear as children of an I2C bus node.  Ditto for SPI, MDIO, USB,
0352 etc.  The only devices which do not require a specific type of parent
0353 device are platform_devices (and amba_devices, but more on that
0354 later), which will happily live at the base of the Linux /sys/devices
0355 tree.  Therefore, if a DT node is at the root of the tree, then it
0356 really probably is best registered as a platform_device.
0357 
0358 Linux board support code calls of_platform_populate(NULL, NULL, NULL, NULL)
0359 to kick off discovery of devices at the root of the tree.  The
0360 parameters are all NULL because when starting from the root of the
0361 tree, there is no need to provide a starting node (the first NULL), a
0362 parent struct device (the last NULL), and we're not using a match
0363 table (yet).  For a board that only needs to register devices,
0364 .init_machine() can be completely empty except for the
0365 of_platform_populate() call.
0366 
0367 In the Tegra example, this accounts for the /soc and /sound nodes, but
0368 what about the children of the SoC node?  Shouldn't they be registered
0369 as platform devices too?  For Linux DT support, the generic behaviour
0370 is for child devices to be registered by the parent's device driver at
0371 driver .probe() time.  So, an i2c bus device driver will register a
0372 i2c_client for each child node, an SPI bus driver will register
0373 its spi_device children, and similarly for other bus_types.
0374 According to that model, a driver could be written that binds to the
0375 SoC node and simply registers platform_devices for each of its
0376 children.  The board support code would allocate and register an SoC
0377 device, a (theoretical) SoC device driver could bind to the SoC device,
0378 and register platform_devices for /soc/interrupt-controller, /soc/serial,
0379 /soc/i2s, and /soc/i2c in its .probe() hook.  Easy, right?
0380 
0381 Actually, it turns out that registering children of some
0382 platform_devices as more platform_devices is a common pattern, and the
0383 device tree support code reflects that and makes the above example
0384 simpler.  The second argument to of_platform_populate() is an
0385 of_device_id table, and any node that matches an entry in that table
0386 will also get its child nodes registered.  In the Tegra case, the code
0387 can look something like this::
0388 
0389   static void __init harmony_init_machine(void)
0390   {
0391         /* ... */
0392         of_platform_populate(NULL, of_default_bus_match_table, NULL, NULL);
0393   }
0394 
0395 "simple-bus" is defined in the Devicetree Specification as a property
0396 meaning a simple memory mapped bus, so the of_platform_populate() code
0397 could be written to just assume simple-bus compatible nodes will
0398 always be traversed.  However, we pass it in as an argument so that
0399 board support code can always override the default behaviour.
0400 
0401 [Need to add discussion of adding i2c/spi/etc child devices]
0402 
0403 Appendix A: AMBA devices
0404 ------------------------
0405 
0406 ARM Primecells are a certain kind of device attached to the ARM AMBA
0407 bus which include some support for hardware detection and power
0408 management.  In Linux, struct amba_device and the amba_bus_type is
0409 used to represent Primecell devices.  However, the fiddly bit is that
0410 not all devices on an AMBA bus are Primecells, and for Linux it is
0411 typical for both amba_device and platform_device instances to be
0412 siblings of the same bus segment.
0413 
0414 When using the DT, this creates problems for of_platform_populate()
0415 because it must decide whether to register each node as either a
0416 platform_device or an amba_device.  This unfortunately complicates the
0417 device creation model a little bit, but the solution turns out not to
0418 be too invasive.  If a node is compatible with "arm,amba-primecell", then
0419 of_platform_populate() will register it as an amba_device instead of a
0420 platform_device.