0001 ================
0002 Kernel mode NEON
0003 ================
0004
0005 TL;DR summary
0006 -------------
0007 * Use only NEON instructions, or VFP instructions that don't rely on support
0008 code
0009 * Isolate your NEON code in a separate compilation unit, and compile it with
0010 '-march=armv7-a -mfpu=neon -mfloat-abi=softfp'
0011 * Put kernel_neon_begin() and kernel_neon_end() calls around the calls into your
0012 NEON code
0013 * Don't sleep in your NEON code, and be aware that it will be executed with
0014 preemption disabled
0015
0016
0017 Introduction
0018 ------------
0019 It is possible to use NEON instructions (and in some cases, VFP instructions) in
0020 code that runs in kernel mode. However, for performance reasons, the NEON/VFP
0021 register file is not preserved and restored at every context switch or taken
0022 exception like the normal register file is, so some manual intervention is
0023 required. Furthermore, special care is required for code that may sleep [i.e.,
0024 may call schedule()], as NEON or VFP instructions will be executed in a
0025 non-preemptible section for reasons outlined below.
0026
0027
0028 Lazy preserve and restore
0029 -------------------------
0030 The NEON/VFP register file is managed using lazy preserve (on UP systems) and
0031 lazy restore (on both SMP and UP systems). This means that the register file is
0032 kept 'live', and is only preserved and restored when multiple tasks are
0033 contending for the NEON/VFP unit (or, in the SMP case, when a task migrates to
0034 another core). Lazy restore is implemented by disabling the NEON/VFP unit after
0035 every context switch, resulting in a trap when subsequently a NEON/VFP
0036 instruction is issued, allowing the kernel to step in and perform the restore if
0037 necessary.
0038
0039 Any use of the NEON/VFP unit in kernel mode should not interfere with this, so
0040 it is required to do an 'eager' preserve of the NEON/VFP register file, and
0041 enable the NEON/VFP unit explicitly so no exceptions are generated on first
0042 subsequent use. This is handled by the function kernel_neon_begin(), which
0043 should be called before any kernel mode NEON or VFP instructions are issued.
0044 Likewise, the NEON/VFP unit should be disabled again after use to make sure user
0045 mode will hit the lazy restore trap upon next use. This is handled by the
0046 function kernel_neon_end().
0047
0048
0049 Interruptions in kernel mode
0050 ----------------------------
0051 For reasons of performance and simplicity, it was decided that there shall be no
0052 preserve/restore mechanism for the kernel mode NEON/VFP register contents. This
0053 implies that interruptions of a kernel mode NEON section can only be allowed if
0054 they are guaranteed not to touch the NEON/VFP registers. For this reason, the
0055 following rules and restrictions apply in the kernel:
0056 * NEON/VFP code is not allowed in interrupt context;
0057 * NEON/VFP code is not allowed to sleep;
0058 * NEON/VFP code is executed with preemption disabled.
0059
0060 If latency is a concern, it is possible to put back to back calls to
0061 kernel_neon_end() and kernel_neon_begin() in places in your code where none of
0062 the NEON registers are live. (Additional calls to kernel_neon_begin() should be
0063 reasonably cheap if no context switch occurred in the meantime)
0064
0065
0066 VFP and support code
0067 --------------------
0068 Earlier versions of VFP (prior to version 3) rely on software support for things
0069 like IEEE-754 compliant underflow handling etc. When the VFP unit needs such
0070 software assistance, it signals the kernel by raising an undefined instruction
0071 exception. The kernel responds by inspecting the VFP control registers and the
0072 current instruction and arguments, and emulates the instruction in software.
0073
0074 Such software assistance is currently not implemented for VFP instructions
0075 executed in kernel mode. If such a condition is encountered, the kernel will
0076 fail and generate an OOPS.
0077
0078
0079 Separating NEON code from ordinary code
0080 ---------------------------------------
0081 The compiler is not aware of the special significance of kernel_neon_begin() and
0082 kernel_neon_end(), i.e., that it is only allowed to issue NEON/VFP instructions
0083 between calls to these respective functions. Furthermore, GCC may generate NEON
0084 instructions of its own at -O3 level if -mfpu=neon is selected, and even if the
0085 kernel is currently compiled at -O2, future changes may result in NEON/VFP
0086 instructions appearing in unexpected places if no special care is taken.
0087
0088 Therefore, the recommended and only supported way of using NEON/VFP in the
0089 kernel is by adhering to the following rules:
0090
0091 * isolate the NEON code in a separate compilation unit and compile it with
0092 '-march=armv7-a -mfpu=neon -mfloat-abi=softfp';
0093 * issue the calls to kernel_neon_begin(), kernel_neon_end() as well as the calls
0094 into the unit containing the NEON code from a compilation unit which is *not*
0095 built with the GCC flag '-mfpu=neon' set.
0096
0097 As the kernel is compiled with '-msoft-float', the above will guarantee that
0098 both NEON and VFP instructions will only ever appear in designated compilation
0099 units at any optimization level.
0100
0101
0102 NEON assembler
0103 --------------
0104 NEON assembler is supported with no additional caveats as long as the rules
0105 above are followed.
0106
0107
0108 NEON code generated by GCC
0109 --------------------------
0110 The GCC option -ftree-vectorize (implied by -O3) tries to exploit implicit
0111 parallelism, and generates NEON code from ordinary C source code. This is fully
0112 supported as long as the rules above are followed.
0113
0114
0115 NEON intrinsics
0116 ---------------
0117 NEON intrinsics are also supported. However, as code using NEON intrinsics
0118 relies on the GCC header <arm_neon.h>, (which #includes <stdint.h>), you should
0119 observe the following in addition to the rules above:
0120
0121 * Compile the unit containing the NEON intrinsics with '-ffreestanding' so GCC
0122 uses its builtin version of <stdint.h> (this is a C99 header which the kernel
0123 does not supply);
0124 * Include <arm_neon.h> last, or at least after <linux/types.h>