'Multiple Git push in one command
I've an application that requires to run git add
/commit
/push
on each single file i'd like to push, in order to trigger a Gitlab Job on each.
My problem is that git is actually taking many time to do the git push
command.
Here are the commands i'm using:
git add myFile.json
git commit myFile.json -m "commitMessage"
variable1 = git rev-parse HEAD # Storing last commit hash into a variable
# Pushing only one specific commit to (maybe) make it faster
git push $variable1:master
What i'd like to do is to make the whole process "faster". What i've thought about:
- Doing multiple pipeline triggers using only one
git push
(maybe by running the pipeline on each commit instead of each push), but it doesn't seem possible. - Doing multiple pushes in one
git push
command, so it doesn't have to reload some of thegit push
init operations before each file pushed (i have no idea on what is happenning during thegit push
process, so that idea may be wrong)
Does anyone has an idea on how to make this process faster, by using one of my ideas, or even a brand new one from you !
Note: I'm using HTTPS, so SSH solutions probably won't fit here.
Output of push command:
Enumerating objects: 8, done.
Counting objects: 100% (8/8), done.
Delta compression using up to 2 threads
Compressing objects: 100% (4/4), done.
Writing objects: 100% (6/6), 521 bytes | 521.00 KiB/s, done.
Total 6 (delta 2), reused 0 (delta 0)
To https://gitlab/root/xxx.git
bd0a7c1..9671b26 master -> master
Thanks in advance !
Edit 1:
Here is the output of GIT_TRACE=true GIT_TRACE_PACKET=true git push
12:47:38.387505 git.c:418 trace: built-in: git push https://root:password@gitlab/root/xxx.git
12:47:38.405394 run-command.c:643 trace: run_command: GIT_DIR=.git git-remote-https https://root:password@gitlab/root/xxx.git https://root:password@gitlab/root/xxx.git
12:47:39.926614 pkt-line.c:80 packet: git< # service=git-receive-pack
12:47:39.926654 pkt-line.c:80 packet: git< 0000
12:47:39.926664 pkt-line.c:80 packet: git< 4a902e3cdd3c06ba7fe9aa0345e510ce7c7ebb73 refs/heads/master\0report-status report-status-v2 delete-refs side-band-64k quiet atomic ofs-delta push-options object-format=sha1 agent=git/2.33.1.gl3
12:47:39.926678 pkt-line.c:80 packet: git< 0000
12:47:39.931365 pkt-line.c:80 packet: git> refs/heads/master:refs/heads/master
12:47:39.934089 pkt-line.c:80 packet: git> 0000
12:47:39.934131 run-command.c:643 trace: run_command: git send-pack --stateless-rpc --helper-status --thin --progress https://root:password@gitlab/root/xxx.git/ --stdin
12:47:39.954005 git.c:418 trace: built-in: git send-pack --stateless-rpc --helper-status --thin --progress https://root:password@gitlab/root/xxx.git/ --stdin
12:47:39.962814 pkt-line.c:80 packet: git< refs/heads/master:refs/heads/master
12:47:39.962841 pkt-line.c:80 packet: git< 0000
12:47:39.965723 pkt-line.c:80 packet: git< 4a902e3cdd3c06ba7fe9aa0345e510ce7c7ebb73 refs/heads/master\0report-status report-status-v2 delete-refs side-band-64k quiet atomic ofs-delta push-options object-format=sha1 agent=git/2.33.1.gl3
12:47:39.965792 pkt-line.c:80 packet: git< 0000
12:47:39.998180 pkt-line.c:80 packet: git> shallow 67769ed9405583783a372dc7731d5c1be8801a63
12:47:39.998324 pkt-line.c:80 packet: git> 4a902e3cdd3c06ba7fe9aa0345e510ce7c7ebb73 4f72bdb2e340f297c5c88a3433bf1700a010f721 refs/heads/master\0 report-status side-band-64k agent=git/2.20.1
12:47:39.998420 pkt-line.c:80 packet: git> 0000
12:47:39.998841 pkt-line.c:80 packet: git< 0035shallow 67769ed9405583783a372dc7731d5c1be8801a6300954a902e3cdd3c06ba7fe9aa0345e510ce7c7ebb73 4f72bdb2e340f297c5c88a3433bf1700a010f721 refs/heads/master\0 report-status side-band-64k agent=git/2.20.10000
12:47:39.998986 run-command.c:643 trace: run_command: git pack-objects --all-progress-implied --revs --stdout --thin --delta-base-offset --progress --shallow
12:47:40.007547 git.c:418 trace: built-in: git pack-objects --all-progress-implied --revs --stdout --thin --delta-base-offset --progress --shallow
Enumerating objects: 8, done.
Counting objects: 100% (8/8), done.
Delta compression using up to 2 threads
Compressing objects: 100% (4/4), done.
Writing objects: 100% (6/6), 530 bytes | 530.00 KiB/s, done.
Total 6 (delta 2), reused 0 (delta 0)
12:47:40.202197 pkt-line.c:80 packet: git> 0000
12:47:40.202196 pkt-line.c:80 packet: git< PACK ...
12:47:41.680994 pkt-line.c:80 packet: sideband< \1000eunpack ok0019ok refs/heads/master0000
12:47:41.681027 pkt-line.c:80 packet: sideband< 0000
12:47:41.681108 pkt-line.c:80 packet: git< unpack ok
12:47:41.681132 pkt-line.c:80 packet: git< ok refs/heads/master
12:47:41.681148 pkt-line.c:80 packet: git< 0000
12:47:41.681153 pkt-line.c:80 packet: git> 0000
To https://gitlab/root/xxx.git
4a902e3..4f72bdb master -> master
The long parts seems to be run_command: GIT_DIR=.git git-remote-https
and git<PACK ...
.
This command was used in a depth=2 git clone
.
Note: The first push
is always fast (seems to be between 1 or 2 seconds), and the subsequents ones are mostly between 3 and 4 seconds. I tried to put a sleep
between my pushes, but the duration isn't changing, so i guess it's not an issue from my code.
Here is the output of my first push:
12:46:16.834461 git.c:418 trace: built-in: git push https://root:password@gitlab/root/xxx.git
12:46:16.836817 run-command.c:643 trace: run_command: GIT_DIR=.git git-remote-https https://root:password@gitlab/root/xxx.git https://root:password@gitlab/root/xxx.git
12:46:17.307690 pkt-line.c:80 packet: git< # service=git-receive-pack
12:46:17.307951 pkt-line.c:80 packet: git< 0000
12:46:17.308003 pkt-line.c:80 packet: git< 6725c964319de48c963746e976a97634f0ee0e7c refs/heads/master\0report-status report-status-v2 delete-refs side-band-64k quiet atomic ofs-delta push-options object-format=sha1 agent=git/2.33.1.gl3
12:46:17.308024 pkt-line.c:80 packet: git< 0000
12:46:17.309205 pkt-line.c:80 packet: git> refs/heads/master:refs/heads/master
12:46:17.309223 pkt-line.c:80 packet: git> 0000
12:46:17.309271 run-command.c:643 trace: run_command: git send-pack --stateless-rpc --helper-status --thin --progress https://root:password@gitlab/root/xxx.git/ --stdin
12:46:17.312085 git.c:418 trace: built-in: git send-pack --stateless-rpc --helper-status --thin --progress https://root:password@gitlab/root/xxx.git/ --stdin
12:46:17.313673 pkt-line.c:80 packet: git< refs/heads/master:refs/heads/master
12:46:17.313707 pkt-line.c:80 packet: git< 0000
12:46:17.313722 pkt-line.c:80 packet: git< 6725c964319de48c963746e976a97634f0ee0e7c refs/heads/master\0report-status report-status-v2 delete-refs side-band-64k quiet atomic ofs-delta push-options object-format=sha1 agent=git/2.33.1.gl3
12:46:17.314002 pkt-line.c:80 packet: git< 0000
12:46:17.316186 pkt-line.c:80 packet: git> shallow 67769ed9405583783a372dc7731d5c1be8801a63
12:46:17.316209 pkt-line.c:80 packet: git> 6725c964319de48c963746e976a97634f0ee0e7c 6665f070f909ed946bf8223516f69691c15e9c29 refs/heads/master\0 report-status side-band-64k agent=git/2.20.1
12:46:17.316316 pkt-line.c:80 packet: git> 0000
12:46:17.316423 pkt-line.c:80 packet: git< 0035shallow 67769ed9405583783a372dc7731d5c1be8801a6300956725c964319de48c963746e976a97634f0ee0e7c 6665f070f909ed946bf8223516f69691c15e9c29 refs/heads/master\0 report-status side-band-64k agent=git/2.20.10000
12:46:17.316509 run-command.c:643 trace: run_command: git pack-objects --all-progress-implied --revs --stdout --thin --delta-base-offset --progress --shallow
12:46:17.318598 git.c:418 trace: built-in: git pack-objects --all-progress-implied --revs --stdout --thin --delta-base-offset --progress --shallow
Enumerating objects: 8, done.
Counting objects: 100% (8/8), done.
Delta compression using up to 2 threads
Compressing objects: 100% (4/4), done.
Writing objects: 100% (6/6), 521 bytes | 173.00 KiB/s, done.
Total 6 (delta 2), reused 0 (delta 0)
12:46:17.339570 pkt-line.c:80 packet: git< PACK ...
12:46:17.339830 pkt-line.c:80 packet: git> 0000
[03/May/2022 12:46:17] "POST /api/getStatus/ HTTP/1.1" 200 854
12:46:18.029134 pkt-line.c:80 packet: sideband< \1000eunpack ok0019ok refs/heads/master0000
12:46:18.029164 pkt-line.c:80 packet: sideband< 0000
12:46:18.029277 pkt-line.c:80 packet: git< unpack ok
12:46:18.029285 pkt-line.c:80 packet: git< ok refs/heads/master
12:46:18.029293 pkt-line.c:80 packet: git< 0000
12:46:18.029297 pkt-line.c:80 packet: git> 0000
To https://gitlab/root/xxx.git
6725c96..6665f07 master -> master
The duration of the GIT_DIR
part seems to be faster, and the git< PACK ...
is almost instant.
Edit 2:
It appeared that it was simply a performance issue from my server, but i think that @torek answer can help people searching about this subject, so i'm marking it as accepted.
Solution 1:[1]
TL;DR
My guess is that you're letting GitLab use a shallow clone, which normally makes things faster, but in this case it's making things much slower.
Long
This is probably the key comment:
The long parts seems to be before Enumerating Objects, and during Writing Objects. The rest is almost instant.
A lot of this gets into the weeds of Git internals, which may change at any time without warning. It's therefore unwise to depend too much on these. Still, here's what's going on:
A Git repository is mostly made up of two databases. One database holds Git objects, and the other holds names: refs or references. The objects are numbered (by hash ID or object ID, OID). The refs turn human-usable names, such as
refs/heads/main
for branchmain
, into OIDs, with just one OID being stored per name.The OIDs are universal: each object has a unique OID, and all Gits everywhere number identical objects identically. This means that any two Git repositories can meet—whether it's for the first time, or for the nth with a large value of n—and yet one can nearly instantly find out whether the other repository has some object, just by handing over the object's ID. The sending Git lists out certain key OIDs, and the receiving Git responds with an "I have that" or "I want that" response.
The next thing we need is complicated technically, but possibly easy enough to visualize or understand. A commit object contains known metadata values, including a list of parent commits and one (single) tree object. A tree object consists of repeated tuples giving component names, object types (or "modes"), and object IDs. The objects listed in a tree are generally either another tree, or a blob object. A blob object represents a file's content: the name of that file is produced by stringing together the names of tree objects that led from a commit to that blob.
The parent(s) of any given commit are commits that existed at the time the commit itself was made. Commits are made one at a time. This means there cannot be cycles in the connections from commit to commit: if commit
H
links backwards to commitG
, commitG
cannot link forwards to commitH
becauseH
did not exist whenG
was made.G
can link back to still-earlier commits, but those cannot link forwards toG
orH
.Similarly, a tree object may not refer to itself, nor may any sub-tree within the tree object refer back to the tree object or any sub-tree that's "above" the sub-tree that is doing the referring. That is, if tree OID
9876543
exists, none of its entries can refer to object9876543
, and none of its subtrees—say,5566778
—can refer to9876543
either. So there cannot be loops in any set of trees found by starting at a commit. These rules mean that a tree is literally a tree, which is a subset of a DAG: see What's the difference between the data structure Tree and Graph?(Blob objects represent file contents, which are opaque to Git at this level: Git does not have to examine them, and does not do so.)
The end result of all of this is that the commits themselves form a Directed Acyclic Graph or DAG. Meanwhile, the top level tree object within each commit forms a tree of tree objects. So we have a DAG of trees, or DAG of DAGs, however you would like to refer to it; such a composition is itself a DAG. (Note that commits can re-use top level or sub-level trees from earlier commits: that's perfectly fine here as it does not break the DAG rules.)
(Atop all of this, we can have annotated tag objects, which store the hash ID of one target object. Because they're limited to a single target, and Git's hash ID computation rules forbid loops, these just add a few leads-into-an-object nodes to the overall DAG-of-DAGs. They add a little bit of complexity to a visualization, but do nothing to mess with the overall DAGginess.)
What all this boils down to, in the end, is that we have this overall graph structure with constraints: directionality and a lack of cycles. Any such DAG has a reachability property: that is, starting from some node in the graph, there may be other nodes in the graph that we can reach, and there may be other nodes in the graph that we cannot reach, by following the one-way connections: commit b789abc
has parent a123456
, so a123456
is reachable from b789abc
. As there are no cycles, this by definition means that b789abc
is not reachable from a123456
. (You cannot, however, infer the reverse: if node X is not reachable from node Y, that does not mean that Y is reachable from X. Perhaps W or Z reaches both X and Y, but X and Y are merely siblings in a tree, for instance.)
To this, we normally add one more constraint: a Git repository never has a "hole" in it. By this, I mean that if we have some node in the graph, we must always have every node reachable from that node. If a123456
is the parent of b789abc
, and we have b789abc
, we must also have a123456
. This in turn means that we must have the entire snapshot of a123456
. If a123456
has a parent commit, we must have the entire snapshot of that commit too, and so on.
Note the emphasis on the word normally above. When this is the case, if we are the sender and we're doing a git push
, we can often tell, just by knowing which commits are the latest commits in the receiving Git repository, everything about those commits. That is, if we have new commit b789abc
and they have its parent a123456
, we already have a123456
ourselves. We also have everything reachable from a123456
. So we know everything about every file they have, at least as far as a123456
and all of its ancestry is concerned.
This gives a sending Git a huge leg up: it tells the receiving Git I can send you commit b789abc
, would you like it? The receiving Git might answer with I already have b789abc
, in which case we know everything we need to know about the receiving Git, or it might say Yes I'd like that. If it says the latter, we, as the sending Git, must now offer the parent a123456
. They will either respond with I already have it or please send it, after which we'll offer its parent(s), and so on.
At some point, we either run out of commits to send—they have nothing and we must send every object—or we hit some commit that they have, which means that they have that commit and every earlier commit and we, the sender, now know precisely what files they have as well. So we can do a great job of sending them just the commits they need, and just the files they need for those commits, and we can compress those files knowing what earlier versions of those files that they already have.
Note that there's a big overall assumption here, that CPU time is cheap, but network bandwidth is expensive. We use this OID-exchange process to find what they already have, then we prepare the new objects and compress them against the known old objects. This ("compressing objects") part can take a lot of time, depending on how fast our own computation is. But it's usually pretty quick because typically we're sending just one or a few commits, with just one or a few new or modified files each, so there's not much to compress. We then send those objects, and that part is as slow as the network is, but if we did a good job of compressing, we don't have to send many objects and we've compressed them very well against other objects they already have.
Note, though, that if we git push
the current (HEAD
) commit, we must send them all parent commits back to the point where our history and their history join up. This maintains the "completeness" or "lack of holes" property. So your code here:
variable1 = git rev-parse HEAD # Storing last commit hash into a variable # Pushing only one specific commit to (maybe) make it faster git push $variable1:master
does no good; you could just git push HEAD:master
, or if your current branch name is master
on your (sender's) "side", you could just git push master
.
I mentioned above that a repository is made up of two databases. The process I've described so far is all about updating the object database. That's the really important one, because the names database is mainly there to help puny humans: all the machine needs is the raw hash IDs. Git won't go nuts trying to tell the difference between 720100ac47678fa31f0844a413f05bd0305d179f
and 720100ac47678fae1f0844a413f05bd0305d179f
, the way humans would (I made one tiny change here: can you spot it?). But we need to update the names database too, so following the above, the sending Git will send either a polite request, or a forceful command (or perhaps more than one of one or both, with different names in the blanks):
- Please, if it's OK, create or update your name _______ to hold ID _______; or
- Set your name _______ to hold _______!
(There's a third kind, a conditionally forceful update, where the sender says I think your name _______ holds _______. If so, set it to _______! That's the --force-with-lease
option, and it just enables a safer way to do the command.)
The receiving Git obeys, or doesn't, at its pleasure according to various rules (hosting servers generally add a bunch of control rules atop the super-simple ones that out-of-the-box Git provides). The sender's job is just to provide the names and hash IDs, along with the polite requests or forceful commands, and take back the reply from the receiving Git: "OK" or "rejected", along with any side messages that might come from a hosting server, e.g., telling a user that he does not have access rights to set that name. The sender then reports the result(s) to the person or process that ran git push
, and updates any remote-tracking names for the git push
operations that actually succeeded.
So, if you have several things to send, you can send them all together:
git push origin master mybranch:theirbranch
for instance. This has your sending Git collect the OIDs for master
and mybranch
on your side, send any commits and supporting objects required to their Git, then ask (politely) that they set their master
and theirbranch
to the OIDs your Git found for your master
and yourbranch
.
Things that go wrong: too many names and shallow clones
That's the normal process. Now let's see what might work to confound it.
First, some (not all) existing sending and receiving processes will sometimes go through every name they have in their names database. For repositories with tens of thousands of tag names, for instance, this can take a lot of time. This happens well before the "counting objects" phase even starts. If you have a network monitor or tracer, you will see a lot of data coming from the receiver, listing out all their branch and tag names and corresponding hash IDs, even though the sender doesn't really need all of that. There are some technical improvements here that are being worked-on in the C version of Git, but they've been in progress for a long time (multiple years now I think) and aren't there yet. If this is a problem, the simplest solution is to prune back a lot of the names, but this generally requires archiving, or at least renaming, the existing repository and making a new one (because some hash IDs will be findable only through the old names, and you probably don't want to lose those forever).
More importantly, go back to the word normally I put in bold-italics above. We can make a kind of Git clone, which Git calls a shallow clone, in which the usual constraints—that there are no "holes" in the graph—are deliberately violated. To implement this, Git writes certain commit hash IDs into a file, saying this commit is assumed to exist, but we don't have it, so we don't know anything about it.
When a receiving Git is shallow, the sending Git has a problem, and when a sending Git is shallow, the sending Git has a problem. ? This means shallow clones are OK as one-time things but are a bad idea for ongoing work, where you're going to run git fetch
("get more objects from some other repository") or git push
("send objects to some other repository"). In particular, the normal assumption, that if someone has commit b789abc
whose parent is a123456
, then they not only have a123456
but also every earlier commit and therefore every version of every file up to that point, just breaks down entirely.
Because of the have/want protocol that senders use when sending to receivers, though, a depth-two shallow clone is much better for doing git push
to a full (non-shallow) clone than is a depth-1 clone. The extra commit in the "deeper" clone allows the sender to understand better what the receiver has, even if the sender doesn't have everything needed to do the best possible compression.
The best sending result is achieved when you have a full clone in the first place, but if you're going to make a shallow clone, then commit and push, start with at least a depth-2 clone. CI systems often use a depth-1 shallow clone on purpose, but usually have knobs to adjust the depth, or to make a full (non-shallow) clone. Some depth compromise (other than just 2) may work best for you case; it's hard to say without actual testing.
Solution 2:[2]
Edit: This answer works for ssh connections. Since the question talk about https the @torek answer is more appropriate.
One way to increase the speed is, to use ssh ControlMaster
option. It avoid ssh to reopen a connection each time.
The ControlMaster option from ssh_config
man page :
Enables the sharing of multiple sessions over a single network connection. When set to yes, ssh(1) will listen for connections on a control socket specified using the ControlPath argument. Additional sessions can connect to this socket using the same ControlPath with ControlMaster set to no (the default). These sessions will try to reuse the master instance's network connection rather than initiating new ones, but will fall back to connecting normally if the control socket does not exist, or is not listening.
For example in my ~/.ssh/config I have:
ControlMaster auto
ControlPath ~/.ssh/mux-%r@%h_%p
ControlPersist 15m
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 |