'loading tensorflow dataset gives NonMatchingChecksumError

My goal is to use the following dataset from tensorflow-datasets for Machine Learning

https://www.tensorflow.org/datasets/catalog/wider_face

import tensorflow as tf
import tensorflow_datasets as tfds

data, info = tfds.load(name='wider_face',as_supervised=True, with_info =True)

leading to the following error:

NonMatchingChecksumError: Artifact https://drive.google.com/uc?export=download&id=1HIfDbVEWKmsYKJZm4lchTBDLW5N7dY5T, downloaded to C:\Users\user\tensorflow_datasets\downloads\ucexport_download_id_1HIfDbVEWKmsYKJZm4lchTBDLfr62-cNGXcnoWarcPOgb67igMZT4ssm73xhXi-__9lo.tmp.830989d7c2724803b592ac9c747a8300\uc, has wrong checksum:
* Expected: UrlInfo(size=1.72 GiB, checksum='3b0313e11ea292ec58894b47ac4c0503b230e12540330845d70a7798241f88d3', filename='WIDER_test.zip')
* Got: UrlInfo(size=2.15 KiB, checksum='3fe166f3882f9f3b1fb287ca88ec2d39655b74381d54c78b993377663c0f5bb3', filename='uc')

Does anyone have an idea how to solve this problem? tensorflow_datasets is on the latest version 4.52.



Solution 1:[1]

According to Tensorflow datasets documentation,

TFDS ensure determinism by validating the checksums of downloaded urls. If NonMatchingChecksumError is raised, might indicate:

  • The website may be down (e.g. 503 status code). Please check the url.

  • For Google Drive URLs, try again later as Drive sometimes rejects downloads when too many people access the same URL. See bug

  • The original datasets files may have been updated. In this case the TFDS dataset builder should be updated. Please open a new Github issue or PR:

    1.Register the new checksums with tfds build --register_checksums

    2.Eventually update the dataset generation code.

    3.Update the dataset VERSION

    4.Update the dataset RELEASE_NOTES: What caused the checksums to change ? Did some examples changed ?

    5.Make sure the dataset can still be built.

    6.Send us a PR

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1