0001 .. SPDX-License-Identifier: GPL-2.0
0002
0003 ===============
0004 Boot Interrupts
0005 ===============
0006
0007 :Author: - Sean V Kelley <sean.v.kelley@linux.intel.com>
0008
0009 Overview
0010 ========
0011
0012 On PCI Express, interrupts are represented with either MSI or inbound
0013 interrupt messages (Assert_INTx/Deassert_INTx). The integrated IO-APIC in a
0014 given Core IO converts the legacy interrupt messages from PCI Express to
0015 MSI interrupts. If the IO-APIC is disabled (via the mask bits in the
0016 IO-APIC table entries), the messages are routed to the legacy PCH. This
0017 in-band interrupt mechanism was traditionally necessary for systems that
0018 did not support the IO-APIC and for boot. Intel in the past has used the
0019 term "boot interrupts" to describe this mechanism. Further, the PCI Express
0020 protocol describes this in-band legacy wire-interrupt INTx mechanism for
0021 I/O devices to signal PCI-style level interrupts. The subsequent paragraphs
0022 describe problems with the Core IO handling of INTx message routing to the
0023 PCH and mitigation within BIOS and the OS.
0024
0025
0026 Issue
0027 =====
0028
0029 When in-band legacy INTx messages are forwarded to the PCH, they in turn
0030 trigger a new interrupt for which the OS likely lacks a handler. When an
0031 interrupt goes unhandled over time, they are tracked by the Linux kernel as
0032 Spurious Interrupts. The IRQ will be disabled by the Linux kernel after it
0033 reaches a specific count with the error "nobody cared". This disabled IRQ
0034 now prevents valid usage by an existing interrupt which may happen to share
0035 the IRQ line::
0036
0037 irq 19: nobody cared (try booting with the "irqpoll" option)
0038 CPU: 0 PID: 2988 Comm: irq/34-nipalk Tainted: 4.14.87-rt49-02410-g4a640ec-dirty #1
0039 Hardware name: National Instruments NI PXIe-8880/NI PXIe-8880, BIOS 2.1.5f1 01/09/2020
0040 Call Trace:
0041
0042 <IRQ>
0043 ? dump_stack+0x46/0x5e
0044 ? __report_bad_irq+0x2e/0xb0
0045 ? note_interrupt+0x242/0x290
0046 ? nNIKAL100_memoryRead16+0x8/0x10 [nikal]
0047 ? handle_irq_event_percpu+0x55/0x70
0048 ? handle_irq_event+0x4f/0x80
0049 ? handle_fasteoi_irq+0x81/0x180
0050 ? handle_irq+0x1c/0x30
0051 ? do_IRQ+0x41/0xd0
0052 ? common_interrupt+0x84/0x84
0053 </IRQ>
0054
0055 handlers:
0056 irq_default_primary_handler threaded usb_hcd_irq
0057 Disabling IRQ #19
0058
0059
0060 Conditions
0061 ==========
0062
0063 The use of threaded interrupts is the most likely condition to trigger
0064 this problem today. Threaded interrupts may not be reenabled after the IRQ
0065 handler wakes. These "one shot" conditions mean that the threaded interrupt
0066 needs to keep the interrupt line masked until the threaded handler has run.
0067 Especially when dealing with high data rate interrupts, the thread needs to
0068 run to completion; otherwise some handlers will end up in stack overflows
0069 since the interrupt of the issuing device is still active.
0070
0071 Affected Chipsets
0072 =================
0073
0074 The legacy interrupt forwarding mechanism exists today in a number of
0075 devices including but not limited to chipsets from AMD/ATI, Broadcom, and
0076 Intel. Changes made through the mitigations below have been applied to
0077 drivers/pci/quirks.c
0078
0079 Starting with ICX there are no longer any IO-APICs in the Core IO's
0080 devices. IO-APIC is only in the PCH. Devices connected to the Core IO's
0081 PCIe Root Ports will use native MSI/MSI-X mechanisms.
0082
0083 Mitigations
0084 ===========
0085
0086 The mitigations take the form of PCI quirks. The preference has been to
0087 first identify and make use of a means to disable the routing to the PCH.
0088 In such a case a quirk to disable boot interrupt generation can be
0089 added. [1]_
0090
0091 Intel® 6300ESB I/O Controller Hub
0092 Alternate Base Address Register:
0093 BIE: Boot Interrupt Enable
0094
0095 == ===========================
0096 0 Boot interrupt is enabled.
0097 1 Boot interrupt is disabled.
0098 == ===========================
0099
0100 Intel® Sandy Bridge through Sky Lake based Xeon servers:
0101 Coherent Interface Protocol Interrupt Control
0102 dis_intx_route2pch/dis_intx_route2ich/dis_intx_route2dmi2:
0103 When this bit is set. Local INTx messages received from the
0104 Intel® Quick Data DMA/PCI Express ports are not routed to legacy
0105 PCH - they are either converted into MSI via the integrated IO-APIC
0106 (if the IO-APIC mask bit is clear in the appropriate entries)
0107 or cause no further action (when mask bit is set)
0108
0109 In the absence of a way to directly disable the routing, another approach
0110 has been to make use of PCI Interrupt pin to INTx routing tables for
0111 purposes of redirecting the interrupt handler to the rerouted interrupt
0112 line by default. Therefore, on chipsets where this INTx routing cannot be
0113 disabled, the Linux kernel will reroute the valid interrupt to its legacy
0114 interrupt. This redirection of the handler will prevent the occurrence of
0115 the spurious interrupt detection which would ordinarily disable the IRQ
0116 line due to excessive unhandled counts. [2]_
0117
0118 The config option X86_REROUTE_FOR_BROKEN_BOOT_IRQS exists to enable (or
0119 disable) the redirection of the interrupt handler to the PCH interrupt
0120 line. The option can be overridden by either pci=ioapicreroute or
0121 pci=noioapicreroute. [3]_
0122
0123
0124 More Documentation
0125 ==================
0126
0127 There is an overview of the legacy interrupt handling in several datasheets
0128 (6300ESB and 6700PXH below). While largely the same, it provides insight
0129 into the evolution of its handling with chipsets.
0130
0131 Example of disabling of the boot interrupt
0132 ------------------------------------------
0133
0134 - Intel® 6300ESB I/O Controller Hub (Document # 300641-004US)
0135 5.7.3 Boot Interrupt
0136 https://www.intel.com/content/dam/doc/datasheet/6300esb-io-controller-hub-datasheet.pdf
0137
0138 - Intel® Xeon® Processor E5-1600/2400/2600/4600 v3 Product Families
0139 Datasheet - Volume 2: Registers (Document # 330784-003)
0140 6.6.41 cipintrc Coherent Interface Protocol Interrupt Control
0141 https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e5-v3-datasheet-vol-2.pdf
0142
0143 Example of handler rerouting
0144 ----------------------------
0145
0146 - Intel® 6700PXH 64-bit PCI Hub (Document # 302628)
0147 2.15.2 PCI Express Legacy INTx Support and Boot Interrupt
0148 https://www.intel.com/content/dam/doc/datasheet/6700pxh-64-bit-pci-hub-datasheet.pdf
0149
0150
0151 If you have any legacy PCI interrupt questions that aren't answered, email me.
0152
0153 Cheers,
0154 Sean V Kelley
0155 sean.v.kelley@linux.intel.com
0156
0157 .. [1] https://lore.kernel.org/r/12131949181903-git-send-email-sassmann@suse.de/
0158 .. [2] https://lore.kernel.org/r/12131949182094-git-send-email-sassmann@suse.de/
0159 .. [3] https://lore.kernel.org/r/487C8EA7.6020205@suse.de/