'PyTorch - RuntimeError: [enforce fail at inline_container.cc:209] . file not found: archive/data.pkl

Problem

I'm trying to load a file using PyTorch, but the error states archive/data.pkl does not exist.

Code

import torch
cachefile = 'cacheddata.pth'
torch.load(cachefile)

Output

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-4-8edf1f27a4bd> in <module>
      1 import torch
      2 cachefile = 'cacheddata.pth'
----> 3 torch.load(cachefile)

~/opt/anaconda3/envs/matching/lib/python3.8/site-packages/torch/serialization.py in load(f, map_location, pickle_module, **pickle_load_args)
    582                     opened_file.seek(orig_position)
    583                     return torch.jit.load(opened_file)
--> 584                 return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
    585         return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
    586 

~/opt/anaconda3/envs/matching/lib/python3.8/site-packages/torch/serialization.py in _load(zip_file, map_location, pickle_module, **pickle_load_args)
    837 
    838     # Load the data (which may in turn use `persistent_load` to load tensors)
--> 839     data_file = io.BytesIO(zip_file.get_record('data.pkl'))
    840     unpickler = pickle_module.Unpickler(data_file, **pickle_load_args)
    841     unpickler.persistent_load = persistent_load

RuntimeError: [enforce fail at inline_container.cc:209] . file not found: archive/data.pkl

Hypothesis

I'm guessing this has something to do with pickle, from the docs:

This save/load process uses the most intuitive syntax and involves the least amount of code. Saving a model in this way will save the entire module using Python’s pickle module. The disadvantage of this approach is that the serialized data is bound to the specific classes and the exact directory structure used when the model is saved. The reason for this is because pickle does not save the model class itself. Rather, it saves a path to the file containing the class, which is used during load time. Because of this, your code can break in various ways when used in other projects or after refactors.

Versions

  • PyTorch version: 1.6.0
  • Python version: 3.8.0


Solution 1:[1]

Turned out the file was somehow corrupted. After generating it again it loaded without issue.

Solution 2:[2]

I was facing the same problem. I downloaded directly the model (.pt) trained with GPU from a notebook on GCP AI Platform. When I loaded it on local by torch.load('models/model.pt', map_location=device), I got this error:

RuntimeError: [enforce fail at inline_container.cc:145] . PytorchStreamReader failed reading zip archive: failed finding central directory`.

I noticed that the size of the downloaded file is much smaller than expected. So same as @Ian, it turned out the file were corrupted when downloading from the notebook. Finally I had to transfer the file from the notebook into a bucket on Google Cloud Storage (GCS) at first instead of downloading it directly, then downloaded the file from GCS. It works now.

Solution 3:[3]

I encountered this issue not for a single file, but consistently on any file I was dealing with. Looking at the file size, you could say they were corrupted, in the sense that they were too small and incomplete, but why were they always created that way?

I think the issue was that I had done a harmless modification to a simple class that I was saving. So like I made a class Foo, kept the data the same but added some method, then tried to save an older instance when I only had a newer class definition of Foo.

Here is an example of what I think happened, but it doesn't reproduce it exactly:

class Foo(object):
  def __init__(self):
    self.contents = [1,2,3]
    
torch.save(Foo(), "foo1.pth")

foo1 = torch.load("foo1.pth") # saved with class version 1 of Foo

# some days later the code looks like this
class Foo(object):
  def __init__(self):
    self.contents = [1,2,3]
  def __len__(self):
    return len(self.contents)

foo1 = torch.load("foo1.pth") # still works
torch.save(foo1, "foo2.pth") # try to save version 1 object where class is no longer known

The first time around I got an error like PicklingError: Can't pickle <class '__main__.Foo'>: it's not the same object as __main__.Foo, but when using Jupyter Notebook's autoreload feature it's hard to tell what exactly happened. Normally older classes can be loaded into newer class definitions without problems.

In any case of what really happened, my solution was to load the old version and manually copy over the data fields into a freshly instantiated version of Foo, like such:

old = torch.load("foo1.pth")
new = Foo()
# new = old # this was the code that caused issues
new.contents = old.contents
torch.save(new, "foo2.pth")

Solution 4:[4]

In my case, the main reason for this error was the .pt file being corrupted. I started downloading the file when the file was still getting created.

So, in order to avoid the error, copy the .pt file in another directory and download the .pt file from that directory.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Ian
Solution 2
Solution 3 Caranown
Solution 4