Back to home page

OSCL-LXR

 
 

    


0001 .. SPDX-License-Identifier: (GPL-2.0+ OR CC-BY-4.0)
0002 .. [see the bottom of this file for redistribution information]
0003 
0004 Reporting regressions
0005 +++++++++++++++++++++
0006 
0007 "*We don't cause regressions*" is the first rule of Linux kernel development;
0008 Linux founder and lead developer Linus Torvalds established it himself and
0009 ensures it's obeyed.
0010 
0011 This document describes what the rule means for users and how the Linux kernel's
0012 development model ensures to address all reported regressions; aspects relevant
0013 for kernel developers are left to Documentation/process/handling-regressions.rst.
0014 
0015 
0016 The important bits (aka "TL;DR")
0017 ================================
0018 
0019 #. It's a regression if something running fine with one Linux kernel works worse
0020    or not at all with a newer version. Note, the newer kernel has to be compiled
0021    using a similar configuration; the detailed explanations below describes this
0022    and other fine print in more detail.
0023 
0024 #. Report your issue as outlined in Documentation/admin-guide/reporting-issues.rst,
0025    it already covers all aspects important for regressions and repeated
0026    below for convenience. Two of them are important: start your report's subject
0027    with "[REGRESSION]" and CC or forward it to `the regression mailing list
0028    <https://lore.kernel.org/regressions/>`_ (regressions@lists.linux.dev).
0029 
0030 #. Optional, but recommended: when sending or forwarding your report, make the
0031    Linux kernel regression tracking bot "regzbot" track the issue by specifying
0032    when the regression started like this::
0033 
0034        #regzbot introduced v5.13..v5.14-rc1
0035 
0036 
0037 All the details on Linux kernel regressions relevant for users
0038 ==============================================================
0039 
0040 
0041 The important basics
0042 --------------------
0043 
0044 
0045 What is a "regression" and what is the "no regressions rule"?
0046 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
0047 
0048 It's a regression if some application or practical use case running fine with
0049 one Linux kernel works worse or not at all with a newer version compiled using a
0050 similar configuration. The "no regressions rule" forbids this to take place; if
0051 it happens by accident, developers that caused it are expected to quickly fix
0052 the issue.
0053 
0054 It thus is a regression when a WiFi driver from Linux 5.13 works fine, but with
0055 5.14 doesn't work at all, works significantly slower, or misbehaves somehow.
0056 It's also a regression if a perfectly working application suddenly shows erratic
0057 behavior with a newer kernel version; such issues can be caused by changes in
0058 procfs, sysfs, or one of the many other interfaces Linux provides to userland
0059 software. But keep in mind, as mentioned earlier: 5.14 in this example needs to
0060 be built from a configuration similar to the one from 5.13. This can be achieved
0061 using ``make olddefconfig``, as explained in more detail below.
0062 
0063 Note the "practical use case" in the first sentence of this section: developers
0064 despite the "no regressions" rule are free to change any aspect of the kernel
0065 and even APIs or ABIs to userland, as long as no existing application or use
0066 case breaks.
0067 
0068 Also be aware the "no regressions" rule covers only interfaces the kernel
0069 provides to the userland. It thus does not apply to kernel-internal interfaces
0070 like the module API, which some externally developed drivers use to hook into
0071 the kernel.
0072 
0073 How do I report a regression?
0074 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
0075 
0076 Just report the issue as outlined in
0077 Documentation/admin-guide/reporting-issues.rst, it already describes the
0078 important points. The following aspects outlined there are especially relevant
0079 for regressions:
0080 
0081  * When checking for existing reports to join, also search the `archives of the
0082    Linux regressions mailing list <https://lore.kernel.org/regressions/>`_ and
0083    `regzbot's web-interface <https://linux-regtracking.leemhuis.info/regzbot/>`_.
0084 
0085  * Start your report's subject with "[REGRESSION]".
0086 
0087  * In your report, clearly mention the last kernel version that worked fine and
0088    the first broken one. Ideally try to find the exact change causing the
0089    regression using a bisection, as explained below in more detail.
0090 
0091  * Remember to let the Linux regressions mailing list
0092    (regressions@lists.linux.dev) know about your report:
0093 
0094    * If you report the regression by mail, CC the regressions list.
0095 
0096    * If you report your regression to some bug tracker, forward the submitted
0097      report by mail to the regressions list while CCing the maintainer and the
0098      mailing list for the subsystem in question.
0099 
0100    If it's a regression within a stable or longterm series (e.g.
0101    v5.15.3..v5.15.5), remember to CC the `Linux stable mailing list
0102    <https://lore.kernel.org/stable/>`_ (stable@vger.kernel.org).
0103 
0104   In case you performed a successful bisection, add everyone to the CC the
0105   culprit's commit message mentions in lines starting with "Signed-off-by:".
0106 
0107 When CCing for forwarding your report to the list, consider directly telling the
0108 aforementioned Linux kernel regression tracking bot about your report. To do
0109 that, include a paragraph like this in your mail::
0110 
0111        #regzbot introduced: v5.13..v5.14-rc1
0112 
0113 Regzbot will then consider your mail a report for a regression introduced in the
0114 specified version range. In above case Linux v5.13 still worked fine and Linux
0115 v5.14-rc1 was the first version where you encountered the issue. If you
0116 performed a bisection to find the commit that caused the regression, specify the
0117 culprit's commit-id instead::
0118 
0119        #regzbot introduced: 1f2e3d4c5d
0120 
0121 Placing such a "regzbot command" is in your interest, as it will ensure the
0122 report won't fall through the cracks unnoticed. If you omit this, the Linux
0123 kernel's regressions tracker will take care of telling regzbot about your
0124 regression, as long as you send a copy to the regressions mailing lists. But the
0125 regression tracker is just one human which sometimes has to rest or occasionally
0126 might even enjoy some time away from computers (as crazy as that might sound).
0127 Relying on this person thus will result in an unnecessary delay before the
0128 regressions becomes mentioned `on the list of tracked and unresolved Linux
0129 kernel regressions <https://linux-regtracking.leemhuis.info/regzbot/>`_ and the
0130 weekly regression reports sent by regzbot. Such delays can result in Linus
0131 Torvalds being unaware of important regressions when deciding between "continue
0132 development or call this finished and release the final?".
0133 
0134 Are really all regressions fixed?
0135 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
0136 
0137 Nearly all of them are, as long as the change causing the regression (the
0138 "culprit commit") is reliably identified. Some regressions can be fixed without
0139 this, but often it's required.
0140 
0141 Who needs to find the root cause of a regression?
0142 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
0143 
0144 Developers of the affected code area should try to locate the culprit on their
0145 own. But for them that's often impossible to do with reasonable effort, as quite
0146 a lot of issues only occur in a particular environment outside the developer's
0147 reach -- for example, a specific hardware platform, firmware, Linux distro,
0148 system's configuration, or application. That's why in the end it's often up to
0149 the reporter to locate the culprit commit; sometimes users might even need to
0150 run additional tests afterwards to pinpoint the exact root cause. Developers
0151 should offer advice and reasonably help where they can, to make this process
0152 relatively easy and achievable for typical users.
0153 
0154 How can I find the culprit?
0155 ~~~~~~~~~~~~~~~~~~~~~~~~~~~
0156 
0157 Perform a bisection, as roughly outlined in
0158 Documentation/admin-guide/reporting-issues.rst and described in more detail by
0159 Documentation/admin-guide/bug-bisect.rst. It might sound like a lot of work, but
0160 in many cases finds the culprit relatively quickly. If it's hard or
0161 time-consuming to reliably reproduce the issue, consider teaming up with other
0162 affected users to narrow down the search range together.
0163 
0164 Who can I ask for advice when it comes to regressions?
0165 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
0166 
0167 Send a mail to the regressions mailing list (regressions@lists.linux.dev) while
0168 CCing the Linux kernel's regression tracker (regressions@leemhuis.info); if the
0169 issue might better be dealt with in private, feel free to omit the list.
0170 
0171 
0172 Additional details about regressions
0173 ------------------------------------
0174 
0175 
0176 What is the goal of the "no regressions rule"?
0177 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
0178 
0179 Users should feel safe when updating kernel versions and not have to worry
0180 something might break. This is in the interest of the kernel developers to make
0181 updating attractive: they don't want users to stay on stable or longterm Linux
0182 series that are either abandoned or more than one and a half years old. That's
0183 in everybody's interest, as `those series might have known bugs, security
0184 issues, or other problematic aspects already fixed in later versions
0185 <http://www.kroah.com/log/blog/2018/08/24/what-stable-kernel-should-i-use/>`_.
0186 Additionally, the kernel developers want to make it simple and appealing for
0187 users to test the latest pre-release or regular release. That's also in
0188 everybody's interest, as it's a lot easier to track down and fix problems, if
0189 they are reported shortly after being introduced.
0190 
0191 Is the "no regressions" rule really adhered in practice?
0192 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
0193 
0194 It's taken really seriously, as can be seen by many mailing list posts from
0195 Linux creator and lead developer Linus Torvalds, some of which are quoted in
0196 Documentation/process/handling-regressions.rst.
0197 
0198 Exceptions to this rule are extremely rare; in the past developers almost always
0199 turned out to be wrong when they assumed a particular situation was warranting
0200 an exception.
0201 
0202 Who ensures the "no regressions" is actually followed?
0203 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
0204 
0205 The subsystem maintainers should take care of that, which are watched and
0206 supported by the tree maintainers -- e.g. Linus Torvalds for mainline and
0207 Greg Kroah-Hartman et al. for various stable/longterm series.
0208 
0209 All of them are helped by people trying to ensure no regression report falls
0210 through the cracks. One of them is Thorsten Leemhuis, who's currently acting as
0211 the Linux kernel's "regressions tracker"; to facilitate this work he relies on
0212 regzbot, the Linux kernel regression tracking bot. That's why you want to bring
0213 your report on the radar of these people by CCing or forwarding each report to
0214 the regressions mailing list, ideally with a "regzbot command" in your mail to
0215 get it tracked immediately.
0216 
0217 How quickly are regressions normally fixed?
0218 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
0219 
0220 Developers should fix any reported regression as quickly as possible, to provide
0221 affected users with a solution in a timely manner and prevent more users from
0222 running into the issue; nevertheless developers need to take enough time and
0223 care to ensure regression fixes do not cause additional damage.
0224 
0225 The answer thus depends on various factors like the impact of a regression, its
0226 age, or the Linux series in which it occurs. In the end though, most regressions
0227 should be fixed within two weeks.
0228 
0229 Is it a regression, if the issue can be avoided by updating some software?
0230 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
0231 
0232 Almost always: yes. If a developer tells you otherwise, ask the regression
0233 tracker for advice as outlined above.
0234 
0235 Is it a regression, if a newer kernel works slower or consumes more energy?
0236 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
0237 
0238 Yes, but the difference has to be significant. A five percent slow-down in a
0239 micro-benchmark thus is unlikely to qualify as regression, unless it also
0240 influences the results of a broad benchmark by more than one percent. If in
0241 doubt, ask for advice.
0242 
0243 Is it a regression, if an external kernel module breaks when updating Linux?
0244 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
0245 
0246 No, as the "no regression" rule is about interfaces and services the Linux
0247 kernel provides to the userland. It thus does not cover building or running
0248 externally developed kernel modules, as they run in kernel-space and hook into
0249 the kernel using internal interfaces occasionally changed.
0250 
0251 How are regressions handled that are caused by security fixes?
0252 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
0253 
0254 In extremely rare situations security issues can't be fixed without causing
0255 regressions; those fixes are given way, as they are the lesser evil in the end.
0256 Luckily this middling almost always can be avoided, as key developers for the
0257 affected area and often Linus Torvalds himself try very hard to fix security
0258 issues without causing regressions.
0259 
0260 If you nevertheless face such a case, check the mailing list archives if people
0261 tried their best to avoid the regression. If not, report it; if in doubt, ask
0262 for advice as outlined above.
0263 
0264 What happens if fixing a regression is impossible without causing another?
0265 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
0266 
0267 Sadly these things happen, but luckily not very often; if they occur, expert
0268 developers of the affected code area should look into the issue to find a fix
0269 that avoids regressions or at least their impact. If you run into such a
0270 situation, do what was outlined already for regressions caused by security
0271 fixes: check earlier discussions if people already tried their best and ask for
0272 advice if in doubt.
0273 
0274 A quick note while at it: these situations could be avoided, if people would
0275 regularly give mainline pre-releases (say v5.15-rc1 or -rc3) from each
0276 development cycle a test run. This is best explained by imagining a change
0277 integrated between Linux v5.14 and v5.15-rc1 which causes a regression, but at
0278 the same time is a hard requirement for some other improvement applied for
0279 5.15-rc1. All these changes often can simply be reverted and the regression thus
0280 solved, if someone finds and reports it before 5.15 is released. A few days or
0281 weeks later this solution can become impossible, as some software might have
0282 started to rely on aspects introduced by one of the follow-up changes: reverting
0283 all changes would then cause a regression for users of said software and thus is
0284 out of the question.
0285 
0286 Is it a regression, if some feature I relied on was removed months ago?
0287 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
0288 
0289 It is, but often it's hard to fix such regressions due to the aspects outlined
0290 in the previous section. It hence needs to be dealt with on a case-by-case
0291 basis. This is another reason why it's in everybody's interest to regularly test
0292 mainline pre-releases.
0293 
0294 Does the "no regression" rule apply if I seem to be the only affected person?
0295 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
0296 
0297 It does, but only for practical usage: the Linux developers want to be free to
0298 remove support for hardware only to be found in attics and museums anymore.
0299 
0300 Note, sometimes regressions can't be avoided to make progress -- and the latter
0301 is needed to prevent Linux from stagnation. Hence, if only very few users seem
0302 to be affected by a regression, it for the greater good might be in their and
0303 everyone else's interest to lettings things pass. Especially if there is an
0304 easy way to circumvent the regression somehow, for example by updating some
0305 software or using a kernel parameter created just for this purpose.
0306 
0307 Does the regression rule apply for code in the staging tree as well?
0308 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
0309 
0310 Not according to the `help text for the configuration option covering all
0311 staging code <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/staging/Kconfig>`_,
0312 which since its early days states::
0313 
0314        Please note that these drivers are under heavy development, may or
0315        may not work, and may contain userspace interfaces that most likely
0316        will be changed in the near future.
0317 
0318 The staging developers nevertheless often adhere to the "no regressions" rule,
0319 but sometimes bend it to make progress. That's for example why some users had to
0320 deal with (often negligible) regressions when a WiFi driver from the staging
0321 tree was replaced by a totally different one written from scratch.
0322 
0323 Why do later versions have to be "compiled with a similar configuration"?
0324 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
0325 
0326 Because the Linux kernel developers sometimes integrate changes known to cause
0327 regressions, but make them optional and disable them in the kernel's default
0328 configuration. This trick allows progress, as the "no regressions" rule
0329 otherwise would lead to stagnation.
0330 
0331 Consider for example a new security feature blocking access to some kernel
0332 interfaces often abused by malware, which at the same time are required to run a
0333 few rarely used applications. The outlined approach makes both camps happy:
0334 people using these applications can leave the new security feature off, while
0335 everyone else can enable it without running into trouble.
0336 
0337 How to create a configuration similar to the one of an older kernel?
0338 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
0339 
0340 Start your machine with a known-good kernel and configure the newer Linux
0341 version with ``make olddefconfig``. This makes the kernel's build scripts pick
0342 up the configuration file (the ".config" file) from the running kernel as base
0343 for the new one you are about to compile; afterwards they set all new
0344 configuration options to their default value, which should disable new features
0345 that might cause regressions.
0346 
0347 Can I report a regression I found with pre-compiled vanilla kernels?
0348 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
0349 
0350 You need to ensure the newer kernel was compiled with a similar configuration
0351 file as the older one (see above), as those that built them might have enabled
0352 some known-to-be incompatible feature for the newer kernel. If in doubt, report
0353 the matter to the kernel's provider and ask for advice.
0354 
0355 
0356 More about regression tracking with "regzbot"
0357 ---------------------------------------------
0358 
0359 What is regression tracking and why should I care about it?
0360 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
0361 
0362 Rules like "no regressions" need someone to ensure they are followed, otherwise
0363 they are broken either accidentally or on purpose. History has shown this to be
0364 true for Linux kernel development as well. That's why Thorsten Leemhuis, the
0365 Linux Kernel's regression tracker, and some people try to ensure all regression
0366 are fixed by keeping an eye on them until they are resolved. Neither of them are
0367 paid for this, that's why the work is done on a best effort basis.
0368 
0369 Why and how are Linux kernel regressions tracked using a bot?
0370 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
0371 
0372 Tracking regressions completely manually has proven to be quite hard due to the
0373 distributed and loosely structured nature of Linux kernel development process.
0374 That's why the Linux kernel's regression tracker developed regzbot to facilitate
0375 the work, with the long term goal to automate regression tracking as much as
0376 possible for everyone involved.
0377 
0378 Regzbot works by watching for replies to reports of tracked regressions.
0379 Additionally, it's looking out for posted or committed patches referencing such
0380 reports with "Link:" tags; replies to such patch postings are tracked as well.
0381 Combined this data provides good insights into the current state of the fixing
0382 process.
0383 
0384 How to see which regressions regzbot tracks currently?
0385 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
0386 
0387 Check out `regzbot's web-interface <https://linux-regtracking.leemhuis.info/regzbot/>`_.
0388 
0389 What kind of issues are supposed to be tracked by regzbot?
0390 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
0391 
0392 The bot is meant to track regressions, hence please don't involve regzbot for
0393 regular issues. But it's okay for the Linux kernel's regression tracker if you
0394 involve regzbot to track severe issues, like reports about hangs, corrupted
0395 data, or internal errors (Panic, Oops, BUG(), warning, ...).
0396 
0397 How to change aspects of a tracked regression?
0398 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
0399 
0400 By using a 'regzbot command' in a direct or indirect reply to the mail with the
0401 report. The easiest way to do that: find the report in your "Sent" folder or the
0402 mailing list archive and reply to it using your mailer's "Reply-all" function.
0403 In that mail, use one of the following commands in a stand-alone paragraph (IOW:
0404 use blank lines to separate one or multiple of these commands from the rest of
0405 the mail's text).
0406 
0407  * Update when the regression started to happen, for example after performing a
0408    bisection::
0409 
0410        #regzbot introduced: 1f2e3d4c5d
0411 
0412  * Set or update the title::
0413 
0414        #regzbot title: foo
0415 
0416  * Monitor a discussion or bugzilla.kernel.org ticket where additions aspects of
0417    the issue or a fix are discussed:::
0418 
0419        #regzbot monitor: https://lore.kernel.org/r/30th.anniversary.repost@klaava.Helsinki.FI/
0420        #regzbot monitor: https://bugzilla.kernel.org/show_bug.cgi?id=123456789
0421 
0422  * Point to a place with further details of interest, like a mailing list post
0423    or a ticket in a bug tracker that are slightly related, but about a different
0424    topic::
0425 
0426        #regzbot link: https://bugzilla.kernel.org/show_bug.cgi?id=123456789
0427 
0428  * Mark a regression as invalid::
0429 
0430        #regzbot invalid: wasn't a regression, problem has always existed
0431 
0432 Regzbot supports a few other commands primarily used by developers or people
0433 tracking regressions. They and more details about the aforementioned regzbot
0434 commands can be found in the `getting started guide
0435 <https://gitlab.com/knurd42/regzbot/-/blob/main/docs/getting_started.md>`_ and
0436 the `reference documentation <https://gitlab.com/knurd42/regzbot/-/blob/main/docs/reference.md>`_
0437 for regzbot.
0438 
0439 ..
0440    end-of-content
0441 ..
0442    This text is available under GPL-2.0+ or CC-BY-4.0, as stated at the top
0443    of the file. If you want to distribute this text under CC-BY-4.0 only,
0444    please use "The Linux kernel developers" for author attribution and link
0445    this as source:
0446    https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/Documentation/admin-guide/reporting-regressions.rst
0447 ..
0448    Note: Only the content of this RST file as found in the Linux kernel sources
0449    is available under CC-BY-4.0, as versions of this text that were processed
0450    (for example by the kernel's build system) might contain content taken from
0451    files which use a more restrictive license.