Back to home page

OSCL-LXR

 
 

    


0001 .. SPDX-License-Identifier: GPL-2.0
0002 
0003 ===============
0004 Boot Interrupts
0005 ===============
0006 
0007 :Author: - Sean V Kelley <sean.v.kelley@linux.intel.com>
0008 
0009 Overview
0010 ========
0011 
0012 On PCI Express, interrupts are represented with either MSI or inbound
0013 interrupt messages (Assert_INTx/Deassert_INTx). The integrated IO-APIC in a
0014 given Core IO converts the legacy interrupt messages from PCI Express to
0015 MSI interrupts.  If the IO-APIC is disabled (via the mask bits in the
0016 IO-APIC table entries), the messages are routed to the legacy PCH. This
0017 in-band interrupt mechanism was traditionally necessary for systems that
0018 did not support the IO-APIC and for boot. Intel in the past has used the
0019 term "boot interrupts" to describe this mechanism. Further, the PCI Express
0020 protocol describes this in-band legacy wire-interrupt INTx mechanism for
0021 I/O devices to signal PCI-style level interrupts. The subsequent paragraphs
0022 describe problems with the Core IO handling of INTx message routing to the
0023 PCH and mitigation within BIOS and the OS.
0024 
0025 
0026 Issue
0027 =====
0028 
0029 When in-band legacy INTx messages are forwarded to the PCH, they in turn
0030 trigger a new interrupt for which the OS likely lacks a handler. When an
0031 interrupt goes unhandled over time, they are tracked by the Linux kernel as
0032 Spurious Interrupts. The IRQ will be disabled by the Linux kernel after it
0033 reaches a specific count with the error "nobody cared". This disabled IRQ
0034 now prevents valid usage by an existing interrupt which may happen to share
0035 the IRQ line::
0036 
0037   irq 19: nobody cared (try booting with the "irqpoll" option)
0038   CPU: 0 PID: 2988 Comm: irq/34-nipalk Tainted: 4.14.87-rt49-02410-g4a640ec-dirty #1
0039   Hardware name: National Instruments NI PXIe-8880/NI PXIe-8880, BIOS 2.1.5f1 01/09/2020
0040   Call Trace:
0041 
0042   <IRQ>
0043    ? dump_stack+0x46/0x5e
0044    ? __report_bad_irq+0x2e/0xb0
0045    ? note_interrupt+0x242/0x290
0046    ? nNIKAL100_memoryRead16+0x8/0x10 [nikal]
0047    ? handle_irq_event_percpu+0x55/0x70
0048    ? handle_irq_event+0x4f/0x80
0049    ? handle_fasteoi_irq+0x81/0x180
0050    ? handle_irq+0x1c/0x30
0051    ? do_IRQ+0x41/0xd0
0052    ? common_interrupt+0x84/0x84
0053   </IRQ>
0054 
0055   handlers:
0056   irq_default_primary_handler threaded usb_hcd_irq
0057   Disabling IRQ #19
0058 
0059 
0060 Conditions
0061 ==========
0062 
0063 The use of threaded interrupts is the most likely condition to trigger
0064 this problem today. Threaded interrupts may not be reenabled after the IRQ
0065 handler wakes. These "one shot" conditions mean that the threaded interrupt
0066 needs to keep the interrupt line masked until the threaded handler has run.
0067 Especially when dealing with high data rate interrupts, the thread needs to
0068 run to completion; otherwise some handlers will end up in stack overflows
0069 since the interrupt of the issuing device is still active.
0070 
0071 Affected Chipsets
0072 =================
0073 
0074 The legacy interrupt forwarding mechanism exists today in a number of
0075 devices including but not limited to chipsets from AMD/ATI, Broadcom, and
0076 Intel. Changes made through the mitigations below have been applied to
0077 drivers/pci/quirks.c
0078 
0079 Starting with ICX there are no longer any IO-APICs in the Core IO's
0080 devices.  IO-APIC is only in the PCH.  Devices connected to the Core IO's
0081 PCIe Root Ports will use native MSI/MSI-X mechanisms.
0082 
0083 Mitigations
0084 ===========
0085 
0086 The mitigations take the form of PCI quirks. The preference has been to
0087 first identify and make use of a means to disable the routing to the PCH.
0088 In such a case a quirk to disable boot interrupt generation can be
0089 added. [1]_
0090 
0091 Intel® 6300ESB I/O Controller Hub
0092   Alternate Base Address Register:
0093    BIE: Boot Interrupt Enable
0094 
0095           ==  ===========================
0096           0   Boot interrupt is enabled.
0097           1   Boot interrupt is disabled.
0098           ==  ===========================
0099 
0100 Intel® Sandy Bridge through Sky Lake based Xeon servers:
0101   Coherent Interface Protocol Interrupt Control
0102    dis_intx_route2pch/dis_intx_route2ich/dis_intx_route2dmi2:
0103           When this bit is set. Local INTx messages received from the
0104           Intel® Quick Data DMA/PCI Express ports are not routed to legacy
0105           PCH - they are either converted into MSI via the integrated IO-APIC
0106           (if the IO-APIC mask bit is clear in the appropriate entries)
0107           or cause no further action (when mask bit is set)
0108 
0109 In the absence of a way to directly disable the routing, another approach
0110 has been to make use of PCI Interrupt pin to INTx routing tables for
0111 purposes of redirecting the interrupt handler to the rerouted interrupt
0112 line by default.  Therefore, on chipsets where this INTx routing cannot be
0113 disabled, the Linux kernel will reroute the valid interrupt to its legacy
0114 interrupt. This redirection of the handler will prevent the occurrence of
0115 the spurious interrupt detection which would ordinarily disable the IRQ
0116 line due to excessive unhandled counts. [2]_
0117 
0118 The config option X86_REROUTE_FOR_BROKEN_BOOT_IRQS exists to enable (or
0119 disable) the redirection of the interrupt handler to the PCH interrupt
0120 line. The option can be overridden by either pci=ioapicreroute or
0121 pci=noioapicreroute. [3]_
0122 
0123 
0124 More Documentation
0125 ==================
0126 
0127 There is an overview of the legacy interrupt handling in several datasheets
0128 (6300ESB and 6700PXH below). While largely the same, it provides insight
0129 into the evolution of its handling with chipsets.
0130 
0131 Example of disabling of the boot interrupt
0132 ------------------------------------------
0133 
0134       - Intel® 6300ESB I/O Controller Hub (Document # 300641-004US)
0135         5.7.3 Boot Interrupt
0136         https://www.intel.com/content/dam/doc/datasheet/6300esb-io-controller-hub-datasheet.pdf
0137 
0138       - Intel® Xeon® Processor E5-1600/2400/2600/4600 v3 Product Families
0139         Datasheet - Volume 2: Registers (Document # 330784-003)
0140         6.6.41 cipintrc Coherent Interface Protocol Interrupt Control
0141         https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e5-v3-datasheet-vol-2.pdf
0142 
0143 Example of handler rerouting
0144 ----------------------------
0145 
0146       - Intel® 6700PXH 64-bit PCI Hub (Document # 302628)
0147         2.15.2 PCI Express Legacy INTx Support and Boot Interrupt
0148         https://www.intel.com/content/dam/doc/datasheet/6700pxh-64-bit-pci-hub-datasheet.pdf
0149 
0150 
0151 If you have any legacy PCI interrupt questions that aren't answered, email me.
0152 
0153 Cheers,
0154     Sean V Kelley
0155     sean.v.kelley@linux.intel.com
0156 
0157 .. [1] https://lore.kernel.org/r/12131949181903-git-send-email-sassmann@suse.de/
0158 .. [2] https://lore.kernel.org/r/12131949182094-git-send-email-sassmann@suse.de/
0159 .. [3] https://lore.kernel.org/r/487C8EA7.6020205@suse.de/