0001 .. SPDX-License-Identifier: GPL-2.0
0002
0003 ============
0004 ORC unwinder
0005 ============
0006
0007 Overview
0008 ========
0009
0010 The kernel CONFIG_UNWINDER_ORC option enables the ORC unwinder, which is
0011 similar in concept to a DWARF unwinder. The difference is that the
0012 format of the ORC data is much simpler than DWARF, which in turn allows
0013 the ORC unwinder to be much simpler and faster.
0014
0015 The ORC data consists of unwind tables which are generated by objtool.
0016 They contain out-of-band data which is used by the in-kernel ORC
0017 unwinder. Objtool generates the ORC data by first doing compile-time
0018 stack metadata validation (CONFIG_STACK_VALIDATION). After analyzing
0019 all the code paths of a .o file, it determines information about the
0020 stack state at each instruction address in the file and outputs that
0021 information to the .orc_unwind and .orc_unwind_ip sections.
0022
0023 The per-object ORC sections are combined at link time and are sorted and
0024 post-processed at boot time. The unwinder uses the resulting data to
0025 correlate instruction addresses with their stack states at run time.
0026
0027
0028 ORC vs frame pointers
0029 =====================
0030
0031 With frame pointers enabled, GCC adds instrumentation code to every
0032 function in the kernel. The kernel's .text size increases by about
0033 3.2%, resulting in a broad kernel-wide slowdown. Measurements by Mel
0034 Gorman [1]_ have shown a slowdown of 5-10% for some workloads.
0035
0036 In contrast, the ORC unwinder has no effect on text size or runtime
0037 performance, because the debuginfo is out of band. So if you disable
0038 frame pointers and enable the ORC unwinder, you get a nice performance
0039 improvement across the board, and still have reliable stack traces.
0040
0041 Ingo Molnar says:
0042
0043 "Note that it's not just a performance improvement, but also an
0044 instruction cache locality improvement: 3.2% .text savings almost
0045 directly transform into a similarly sized reduction in cache
0046 footprint. That can transform to even higher speedups for workloads
0047 whose cache locality is borderline."
0048
0049 Another benefit of ORC compared to frame pointers is that it can
0050 reliably unwind across interrupts and exceptions. Frame pointer based
0051 unwinds can sometimes skip the caller of the interrupted function, if it
0052 was a leaf function or if the interrupt hit before the frame pointer was
0053 saved.
0054
0055 The main disadvantage of the ORC unwinder compared to frame pointers is
0056 that it needs more memory to store the ORC unwind tables: roughly 2-4MB
0057 depending on the kernel config.
0058
0059
0060 ORC vs DWARF
0061 ============
0062
0063 ORC debuginfo's advantage over DWARF itself is that it's much simpler.
0064 It gets rid of the complex DWARF CFI state machine and also gets rid of
0065 the tracking of unnecessary registers. This allows the unwinder to be
0066 much simpler, meaning fewer bugs, which is especially important for
0067 mission critical oops code.
0068
0069 The simpler debuginfo format also enables the unwinder to be much faster
0070 than DWARF, which is important for perf and lockdep. In a basic
0071 performance test by Jiri Slaby [2]_, the ORC unwinder was about 20x
0072 faster than an out-of-tree DWARF unwinder. (Note: That measurement was
0073 taken before some performance tweaks were added, which doubled
0074 performance, so the speedup over DWARF may be closer to 40x.)
0075
0076 The ORC data format does have a few downsides compared to DWARF. ORC
0077 unwind tables take up ~50% more RAM (+1.3MB on an x86 defconfig kernel)
0078 than DWARF-based eh_frame tables.
0079
0080 Another potential downside is that, as GCC evolves, it's conceivable
0081 that the ORC data may end up being *too* simple to describe the state of
0082 the stack for certain optimizations. But IMO this is unlikely because
0083 GCC saves the frame pointer for any unusual stack adjustments it does,
0084 so I suspect we'll really only ever need to keep track of the stack
0085 pointer and the frame pointer between call frames. But even if we do
0086 end up having to track all the registers DWARF tracks, at least we will
0087 still be able to control the format, e.g. no complex state machines.
0088
0089
0090 ORC unwind table generation
0091 ===========================
0092
0093 The ORC data is generated by objtool. With the existing compile-time
0094 stack metadata validation feature, objtool already follows all code
0095 paths, and so it already has all the information it needs to be able to
0096 generate ORC data from scratch. So it's an easy step to go from stack
0097 validation to ORC data generation.
0098
0099 It should be possible to instead generate the ORC data with a simple
0100 tool which converts DWARF to ORC data. However, such a solution would
0101 be incomplete due to the kernel's extensive use of asm, inline asm, and
0102 special sections like exception tables.
0103
0104 That could be rectified by manually annotating those special code paths
0105 using GNU assembler .cfi annotations in .S files, and homegrown
0106 annotations for inline asm in .c files. But asm annotations were tried
0107 in the past and were found to be unmaintainable. They were often
0108 incorrect/incomplete and made the code harder to read and keep updated.
0109 And based on looking at glibc code, annotating inline asm in .c files
0110 might be even worse.
0111
0112 Objtool still needs a few annotations, but only in code which does
0113 unusual things to the stack like entry code. And even then, far fewer
0114 annotations are needed than what DWARF would need, so they're much more
0115 maintainable than DWARF CFI annotations.
0116
0117 So the advantages of using objtool to generate ORC data are that it
0118 gives more accurate debuginfo, with very few annotations. It also
0119 insulates the kernel from toolchain bugs which can be very painful to
0120 deal with in the kernel since we often have to workaround issues in
0121 older versions of the toolchain for years.
0122
0123 The downside is that the unwinder now becomes dependent on objtool's
0124 ability to reverse engineer GCC code flow. If GCC optimizations become
0125 too complicated for objtool to follow, the ORC data generation might
0126 stop working or become incomplete. (It's worth noting that livepatch
0127 already has such a dependency on objtool's ability to follow GCC code
0128 flow.)
0129
0130 If newer versions of GCC come up with some optimizations which break
0131 objtool, we may need to revisit the current implementation. Some
0132 possible solutions would be asking GCC to make the optimizations more
0133 palatable, or having objtool use DWARF as an additional input, or
0134 creating a GCC plugin to assist objtool with its analysis. But for now,
0135 objtool follows GCC code quite well.
0136
0137
0138 Unwinder implementation details
0139 ===============================
0140
0141 Objtool generates the ORC data by integrating with the compile-time
0142 stack metadata validation feature, which is described in detail in
0143 tools/objtool/Documentation/objtool.txt. After analyzing all
0144 the code paths of a .o file, it creates an array of orc_entry structs,
0145 and a parallel array of instruction addresses associated with those
0146 structs, and writes them to the .orc_unwind and .orc_unwind_ip sections
0147 respectively.
0148
0149 The ORC data is split into the two arrays for performance reasons, to
0150 make the searchable part of the data (.orc_unwind_ip) more compact. The
0151 arrays are sorted in parallel at boot time.
0152
0153 Performance is further improved by the use of a fast lookup table which
0154 is created at runtime. The fast lookup table associates a given address
0155 with a range of indices for the .orc_unwind table, so that only a small
0156 subset of the table needs to be searched.
0157
0158
0159 Etymology
0160 =========
0161
0162 Orcs, fearsome creatures of medieval folklore, are the Dwarves' natural
0163 enemies. Similarly, the ORC unwinder was created in opposition to the
0164 complexity and slowness of DWARF.
0165
0166 "Although Orcs rarely consider multiple solutions to a problem, they do
0167 excel at getting things done because they are creatures of action, not
0168 thought." [3]_ Similarly, unlike the esoteric DWARF unwinder, the
0169 veracious ORC unwinder wastes no time or siloconic effort decoding
0170 variable-length zero-extended unsigned-integer byte-coded
0171 state-machine-based debug information entries.
0172
0173 Similar to how Orcs frequently unravel the well-intentioned plans of
0174 their adversaries, the ORC unwinder frequently unravels stacks with
0175 brutal, unyielding efficiency.
0176
0177 ORC stands for Oops Rewind Capability.
0178
0179
0180 .. [1] https://lore.kernel.org/r/20170602104048.jkkzssljsompjdwy@suse.de
0181 .. [2] https://lore.kernel.org/r/d2ca5435-6386-29b8-db87-7f227c2b713a@suse.cz
0182 .. [3] http://dustin.wikidot.com/half-orcs-and-orcs