'git merge multiple copies preserving history
I have a project which has multiple copies of some files in different places. For example:
src/location1/foobar.h
src/location1/foobar.cpp
src/location2/foobar.h
src/location2/foobar.cpp
I am extracting these into the own library. So I wish to end up with:
src/location3/foobar.h combining multiple versions of foobar.h
src/location3/foobar.cpp combining multiple versions of foobar.cpp
I've passed the first hurdle of removing all unwanted files using:
git filter-repo --path-glob \*foobar\*
Discovering in the process that filter-branch has recently been superceded by the superior filter-repo (worth repeating as filter-branch still appears in many top answers here).
I now want to combine the copies into one preserving all their histories.
The two candidates for this are merge
and merge-file
.
merge-file
requires the common ancestor of each file to be identified which is a pain as it was probably:
src/location3/foobar.h
which is somewhere unknown in the commit history.
We have git merge-base
to find the best common ancestor.
I'm not clear how to specify the file version for git merge-file I want to do:
git mv src/location1/foobar.h src/newlocation/foobar.h
git commit
git merge-file src/newlocation/foobar.h src/location3/foobar@<commitid> src/location2/foobar.h
...
git merge-file src/newlocation/foobar.h src/location3/foobar@<commitid> src/location3/foobar.h
This is quite painstaking and has to be repeated for each file. Another way is to create multiple temporary branches:
git checkout -b newlibbranch
git mv src/location1/foobar.h src/newlocation/foobar.h
git mv src/location1/foobar.cpp src/newlocation/foobar.cpp
git commit
git checkout oldversion
git checkout -b v2
git mv src/location2/foobar.h src/newlocation/foobar.h
git mv src/location2/foobar.cpp src/newlocation/foobar.cpp
git commit
git checkout newlibbranch
git merge --allow-unrelated-histories v2
This is also quite painstaking. Though it is possibly scriptable. There is also a practical problem as the merge is "rename/rename" conflict rather than a merge of the actual files. This seems to be solved by adding --allow-unrelated-histories
So my questions are:
Regarding the task:
- Is there a better way? perhaps a merge tool I am unaware of like I was unaware of filter-repo
- I am correct in thinking the multiple merge branches way is better than git merge-file?
Regarding merge-file:
- how do I specify a particular version of a file for git merge-file
- Is there a command or script which finds the common ancestor automatically. Something like:
git merge-file-wrapper location1 location2 -->
base = `git merge-base location1 location2`
git merge-file location1 $base location2
Could it be that this does not exist because there are some hidden pitfalls?
Solution 1:[1]
I haven't found any automated tool to do this so there may be a gap in the ecosystem for one.
In my case I had multiple files to move some of which had more copies than others which adds some interesting complexity but is not uncommon when refactoring to remove duplication.
What I did in the end was:
write a script to create a new branch where each variant is moved to its new location.
My script first identifies the files to be moved.
Finds the file with the most copies and creates that many branches.
For each branch it tries to move one copy of each file to its new location
I then merged each branch manually.
Most of these merges were trivial things such as changing a namespace for each sub-project.
The result is a single set of files which have all the changes I wanted and all the change history from each of them.
To make this a bit more concrete:
Step 1: use filter-repo to create a project containing just the files of interest
(note this should be done on a fresh clone of the project)
git filter-repo --path-glob \*ThingIWant1\* --path-glob \*AnotherThingIWant\*
git filter-repo --invert --path-glob \*ThingIDontWant\*
- Step 2: create branches
#!/bin/bash
# find unique filenames
MAXLOCS=0
FILES=`find . -not -path '*/.*' -type f | grep -v makebranch | xargs -ifile basename file | sort -u`
for FILE in $FILES; do
echo FILE=$FILE
# find number of locations for each filename
NUMLOCS=`find . -not -path '*/.*' -name $FILE | wc -l`
if [ $NUMLOCS -gt $MAXLOCS ]; then
MAXLOCS=$NUMLOCS
fi
done
echo "$MAXLOCS branches required"
# for each branch
# move one location of each file to its final destination
L=0
while [ $L -lt $MAXLOCS ]; do
git checkout develop
git checkout -b ps$L
for FILE in $FILES; do
echo FILE=$FILE
LOCS=( $(find . -not -path '*/.*' -name $FILE) )
NUMLOCS=${#LOCS[@]}
if [ $L -lt $NUMLOCS ]; then
LOC=${LOCS[$L]}
echo "mv $LOC"
# Move source files to one place and test files to another
# In my case we have src and test
echo $LOC | grep -q /src/
if [ $? ]; then
mkdir -p FinalDestinationForSource
git mv $LOC FinalDestinationForSource/$FILE
if [ $? -ne 0 ];then
echo "BAD: git mv $LOC FinalDestinationForSource/$FILE"
fi
else
mkdir -p FinalDestinationForTests
git mv $LOC FinalDestinationForTests/$FILE
if [ $? -ne 0 ];then
echo "BAD: git mv $LOC FinalDestinationForTests/$FILE"
fi
fi
fi
done
git add -u
git status
git commit -m "#Ticket: move Things to new location $L"
((L = L + 1))
done
- Step 3: merge each branch
git checkout ps0
git merge ps1 -X rename-threshold=5%
# resolve manually... then
git commit
git merge ps1 -X rename-threshold=5%
# resolve manually... then
git commit
The rename-threshold helps convince git that the files share the same origin. Otherwise one version may simply replace the other without retaining the change history linking them. I think the result is equivalent to linking multiple commits using git commit-tree which would be another way to solve this problem.
You can verify the history using git blame
to see where each line came from in each file and git log
to see the actual commits.
Raymond Chen has a series of blogs on this which may be of interest. He approaches this task using commit-tree. I think that would work but I think its a little too low-level an approach for my case.
Step 4: merge your library into the project it belongs in
This is included for completeness as you may be moving files to another project. See " How do you merge two Git repositories? " for more details
cd targetProject
git remote add sourceProject /path/to/sourceProject
git fetch sourceProject
git merge --allow-unrelated-histories sourceProject/ps0
I think this area is ripe for contributing a script to add a new merge facility to git.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Bruce Adams |