'How to automatically get a certain file(>1MB) from git

I want to grab a certain file from a private git repository daily under linux. I've got no problem with files under 1MB via Get content API with curl command as follows.

curl -H "Content-Type: application/json" -H "Authorization: token $TOKEN" -H 'Accept: application/vnd.github.v3.raw' -O $FILEPATH

As the file gets bigger than 1MB now, I have no idea how to do this now.

Git tells me to use the Git Data API to get a blob(up to 100MB, more than enough for me).

Though I've been trying to find a way to grab the SHA1 of the frequently updating file, I haven't came across any applicable method yet. Any suggestion?

Or maybe method other than using git API?

Thanks in advance.



Solution 1:[1]

If file path in the repository is known, you can receive its SHA using Contents API. For example:

~ ? curl -H "Content-Type: application/json" \
    -H "Authorization: token $TOKEN" \
    -H "Accept: application/vnd.github.v3" \
    https://api.github.com/repos/smt116/dotfiles/contents/README.md

{
  "name": "README.md",
  "path": "README.md",
  "sha": "36bba4cf1f8fd3cbbdf81d4cc2291b54a4e56a63",
  "size": 16,
  "url": "https://api.github.com/repos/smt116/dotfiles/contents/README.md?ref=master",
  "html_url": "https://github.com/smt116/dotfiles/blob/master/README.md",
  "git_url": "https://api.github.com/repos/smt116/dotfiles/git/blobs/36bba4cf1f8fd3cbbdf81d4cc2291b54a4e56a63",
  "download_url": "https://raw.githubusercontent.com/smt116/dotfiles/master/README.md",
  "type": "file",
  "content": "IyMgTXkgZG90ZmlsZXMuCg==\n",
  "encoding": "base64",
  "_links": {
    "self": "https://api.github.com/repos/smt116/dotfiles/contents/README.md?ref=master",
    "git": "https://api.github.com/repos/smt116/dotfiles/git/blobs/36bba4cf1f8fd3cbbdf81d4cc2291b54a4e56a63",
    "html": "https://github.com/smt116/dotfiles/blob/master/README.md"
  }
}

Now you can download the file with Git Data API using git_url link that is included in the JSON response.

However if you want to download all blobs from a given repository, you can use Git Trees to fetch the list first. You need to specify commit SHA but you can use HEAD if the most recent commit is okay. For example:

~ ? curl -H "Content-Type: application/json" \
      -H "Authorization: token $TOKEN" \
      -H "Accept: application/vnd.github.v3.raw" \
      https://api.github.com/repos/smt116/dotfiles/git/trees/HEAD

{
  "sha": "0fc96d75ff4182913cec229978bb10ad338012fd",
  "url": "https://api.github.com/repos/smt116/dotfiles/git/trees/0fc96d75ff4182913cec229978bb10ad338012fd",
  "tree": [
    {
      "path": ".agignore",
      "mode": "100644",
      "type": "blob",
      "sha": "e2ca571728887bce8255ab3f66061dde53ffae4f",
      "size": 21,
      "url": "https://api.github.com/repos/smt116/dotfiles/git/blobs/e2ca571728887bce8255ab3f66061dde53ffae4f"
    },
    {
      "path": ".bundle",
      "mode": "040000",
      "type": "tree",
      "sha": "4148d567286de6aa47047672b1f2f73d7bea349b",
      "url": "https://api.github.com/repos/smt116/dotfiles/git/trees/4148d567286de6aa47047672b1f2f73d7bea349b"
    },
    ...

To get details of all files including subdirectories, you have to add recursive=1 query parameter to the URL.

Then you need to parse JSON response, filter those items that have blob type and download files using url attributes.

Solution 2:[2]

This should be easier now (May 2022) using just the filepath, since the Get repository Content API finally support raw content up to 100MB instead of 1MB.

Increased file size limit when retrieving file contents via REST API

Previously, the Get repository content REST API endpoint had a file size limit of 1 MB.
That didn’t correspond to the Create or update file contents endpoint which has a file size limit of 100 MB.

Now, both endpoints have a file size limit of 100 MB.

However, requests for file contents larger than 1 MB must include the .raw custom media type in the Accept HTTP header, as shown here:

 Accept: application/vnd.github.v3.raw

Read more about GitHub's REST API endpoints for repository contents.

curl -H "Accept: application/vnd.github.v3+json" \
     https://api.github.com/repos/OWNER/REPO/contents/PATH

Between 1-100 MB: Only the raw or object custom media types are supported.
Both will work as normal, except that when using the object media type, the content field will be an empty string and the encoding field will be "none". > To get the contents of these larger files, use the raw media type.

Greater than 100 MB: This endpoint is not supported.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Maciej Ma?ecki
Solution 2 VonC