Back to home page

OSCL-LXR

 
 

    


0001 .. SPDX-License-Identifier: GPL-2.0
0002 
0003 =====================================
0004 Handling messy pull-request diffstats
0005 =====================================
0006 
0007 Subsystem maintainers routinely use ``git request-pull`` as part of the
0008 process of sending work upstream.  Normally, the result includes a nice
0009 diffstat that shows which files will be touched and how much of each will
0010 be changed.  Occasionally, though, a repository with a relatively
0011 complicated development history will yield a massive diffstat containing a
0012 great deal of unrelated work.  The result looks ugly and obscures what the
0013 pull request is actually doing.  This document describes what is happening
0014 and how to fix things up; it is derived from The Wisdom of Linus Torvalds,
0015 found in Linus1_ and Linus2_.
0016 
0017 .. _Linus1: https://lore.kernel.org/lkml/CAHk-=wg3wXH2JNxkQi+eLZkpuxqV+wPiHhw_Jf7ViH33Sw7PHA@mail.gmail.com/
0018 .. _Linus2: https://lore.kernel.org/lkml/CAHk-=wgXbSa8yq8Dht8at+gxb_idnJ7X5qWZQWRBN4_CUPr=eQ@mail.gmail.com/
0019 
0020 A Git development history proceeds as a series of commits.  In a simplified
0021 manner, mainline kernel development looks like this::
0022 
0023   ... vM --- vN-rc1 --- vN-rc2 --- vN-rc3 --- ... --- vN-rc7 --- vN
0024 
0025 If one wants to see what has changed between two points, a command like
0026 this will do the job::
0027 
0028   $ git diff --stat --summary vN-rc2..vN-rc3
0029 
0030 Here, there are two clear points in the history; Git will essentially
0031 "subtract" the beginning point from the end point and display the resulting
0032 differences.  The requested operation is unambiguous and easy enough to
0033 understand.
0034 
0035 When a subsystem maintainer creates a branch and commits changes to it, the
0036 result in the simplest case is a history that looks like::
0037 
0038   ... vM --- vN-rc1 --- vN-rc2 --- vN-rc3 --- ... --- vN-rc7 --- vN
0039                           |
0040                           +-- c1 --- c2 --- ... --- cN
0041 
0042 If that maintainer now uses ``git diff`` to see what has changed between
0043 the mainline branch (let's call it "linus") and cN, there are still two
0044 clear endpoints, and the result is as expected.  So a pull request
0045 generated with ``git request-pull`` will also be as expected.  But now
0046 consider a slightly more complex development history::
0047 
0048   ... vM --- vN-rc1 --- vN-rc2 --- vN-rc3 --- ... --- vN-rc7 --- vN
0049                 |         |
0050                 |         +-- c1 --- c2 --- ... --- cN
0051                 |                   /
0052                 +-- x1 --- x2 --- x3
0053 
0054 Our maintainer has created one branch at vN-rc1 and another at vN-rc2; the
0055 two were then subsequently merged into c2.  Now a pull request generated
0056 for cN may end up being messy indeed, and developers often end up wondering
0057 why.
0058 
0059 What is happening here is that there are no longer two clear end points for
0060 the ``git diff`` operation to use.  The development culminating in cN
0061 started in two different places; to generate the diffstat, ``git diff``
0062 ends up having pick one of them and hoping for the best.  If the diffstat
0063 starts at vN-rc1, it may end up including all of the changes between there
0064 and the second origin end point (vN-rc2), which is certainly not what our
0065 maintainer had in mind.  With all of that extra junk in the diffstat, it
0066 may be impossible to tell what actually happened in the changes leading up
0067 to cN.
0068 
0069 Maintainers often try to resolve this problem by, for example, rebasing the
0070 branch or performing another merge with the linus branch, then recreating
0071 the pull request.  This approach tends not to lead to joy at the receiving
0072 end of that pull request; rebasing and/or merging just before pushing
0073 upstream is a well-known way to get a grumpy response.
0074 
0075 So what is to be done?  The best response when confronted with this
0076 situation is to indeed to do a merge with the branch you intend your work
0077 to be pulled into, but to do it privately, as if it were the source of
0078 shame.  Create a new, throwaway branch and do the merge there::
0079 
0080   ... vM --- vN-rc1 --- vN-rc2 --- vN-rc3 --- ... --- vN-rc7 --- vN
0081                 |         |                                      |
0082                 |         +-- c1 --- c2 --- ... --- cN           |
0083                 |                   /               |            |
0084                 +-- x1 --- x2 --- x3                +------------+-- TEMP
0085 
0086 The merge operation resolves all of the complications resulting from the
0087 multiple beginning points, yielding a coherent result that contains only
0088 the differences from the mainline branch.  Now it will be possible to
0089 generate a diffstat with the desired information::
0090 
0091   $ git diff -C --stat --summary linus..TEMP
0092 
0093 Save the output from this command, then simply delete the TEMP branch;
0094 definitely do not expose it to the outside world.  Take the saved diffstat
0095 output and edit it into the messy pull request, yielding a result that
0096 shows what is really going on.  That request can then be sent upstream.