'How to use curl -Z (--parallel) effectively?
I need to download thousands of files with curl. I know how to parallelize with xargs -Pn
(or gnu parallel
) but I've just discovered curl itself can parallelize downloads with the argument -Z|--parallel
introduced in curl-7.66 (see curl-goez-parallel) which might be cleaner or easier to share.
I need to use -o|--output
option and --create-dirs
. URLs need to be percent-encoded, the URL path becoming the folder path which also need to be escaped as path can contain single quotes, spaces, and usual suspects (hence -O option
is not safe and -OJ option
doesn't help).
If I understand well, curl command should be build like so:
curl -Z -o path/to/file1 http://site/path/to/file1 -o path/to/file2 http://site/path/to/file2 [-o path/to/file3 http://site/path/to/file3, etc.]
This works indeed, but what's the best way to deal with thousands URLS. Can a config
file used with -K config
be useful? what if the -o path/to/file_x http://site/path/to/file_x
is the output of another program? I haven't found any way to record commands in a file, one command per line, say.
Thanks
Solution 1:[1]
I ended up asking on curl user mailing list.
The solution that worked for me was building a config (plain text) file built like so programmatically:
config.txt:
url=http://site/path/to/file1
output="./path/to/file1"
url=http://site/path/to/file2
output="./path/to/file2"
...
and use this curl command:
$ curl --parallel --parallel-immediate --parallel-max 60 --config config.txt
The command is a bit reacher than this and requires a recent curl version (7.81+)
$ curl --parallel --parallel-immediate --parallel-max 60 --config config.txt \
--fail-with-body --retry 5 --create-dirs \
--write-out "code %{response_code} url %{url} type %{content_type}\n"
For options like -K, --config see https://curl.se/docs/manpage.html
HTH
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | jgran |