'DistCP - Even simple copies result in CRC Exceptions
I'm running into an issue using distcp to copy files - every copy fails with an IO Exception (Checksum mismatch), even if performing a simple copy within the cluster (i.e. hadoop distcp -pbugctrx /foo/bar /foo/baz
).
If forced to complete the copy using -skipcrccheck
, I can see that the checksum is different ( hdfs dfs -checksum
), but that this isn't being caused by a difference in the actual source data (hdfs dfs -cat | md5sum
returns matching checksums for source and destination).
I'm leery of disabling a data integrity check if I don't need to. Is there a better way to address this failing check than just ignoring it.
Solution 1:[1]
Both the source and target may be in different encryption zones. In that case also the checksum will fail
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Ayush Saxena |