'S3DistCP - Split source in multiples jobs
I have to do copy of an S3 to HDFS of an cluster EMR. I'm trying to smaller the execution time of my job. Looking in the logs the map input of the job is 1_000_000 of files. I need to split this to 100_00 files per job. Is possible defines this behavior in the command to add the step in emr?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|