'DistCP - Even simple copies result in CRC Exceptions

I'm running into an issue using distcp to copy files - every copy fails with an IO Exception (Checksum mismatch), even if performing a simple copy within the cluster (i.e. hadoop distcp -pbugctrx /foo/bar /foo/baz).

If forced to complete the copy using -skipcrccheck, I can see that the checksum is different ( hdfs dfs -checksum ), but that this isn't being caused by a difference in the actual source data (hdfs dfs -cat | md5sum returns matching checksums for source and destination).

I'm leery of disabling a data integrity check if I don't need to. Is there a better way to address this failing check than just ignoring it.



Solution 1:[1]

Both the source and target may be in different encryption zones. In that case also the checksum will fail

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Ayush Saxena