'Having Issues Loading the CelebA dataset on Google Colab Using Pytorch

I need to load the CelebA dataset for a Python (Pytorch) implementation of the following paper: https://arxiv.org/pdf/1908.10578.pdf The original code for loading the CelebA dataset was written in MATLAB using MatConvNet with autonn (source 15 paper). I have the source code but I'm not sure if I can share it.

It's my first time using Pytorch(version 1.9.0+cu102) and doing a paper implementation in Computer Vision.

I looked at the following relevant question: How do I load the CelebA dataset on Google Colab, using torch vision, without running out of memory?

and tested out the solution suggested by user anurag: https://stackoverflow.com/a/65528710/15087536

Unfortunately, I'm still getting a syntax error.

Here's the code below:

import torchvision
from torchvision.datasets import ImageFolder
from torch.utils.data import DataLoader
from torchvision import transforms

# Root directory for the dataset
data_root = 'data/celeba'
# Spatial size of training images, images are resized to this size.
image_size = 64
# batch size
batch_size = 50000

transform=transforms.Compose([transforms.Resize(image_size),
transforms.CenterCrop(image_size),transforms.ToTensor(),transforms.Normalize(mean= 
[0.5, 0.5, 0.5],std=[0.5, 0.5, 0.5])

dataset = ImageFolder(data_root,transform)   **syntax error**


Solution 1:[1]

Since we do not know the syntax error in your case, I cannot comment on it.

Below I will share one possible way to do it.

  1. You can download the celebA dataset from Kaggle using this link. Alternatively, you can also create a Kaggle kernel using this data (no need to download data then)

  2. If you are using google colab, upload this data accessible from your notebook.

  3. Next you can write a PyTorch dataset which will load the images based on the partition (train, valid, test).

  4. I am pasting an example below. You can always customize this to suit your needs.


    from torch.utils.data import Dataset, DataLoader
    import pandas as pd
    from skimage import io
    class CelebDataset(Dataset):
        def __init__(self,data_dir,partition_file_path,split,transform):
            self.partition_file = pd.read_csv(partition_file_path)
            self.data_dir = data_dir
            self.split = split
            self.transform = transform
        def __len__(self):
            self.partition_file_sub = self.partition_file[self.partition_file["partition"].isin(self.split)]
            return len(self.partition_file_sub)
        def __getitem__(self,idx):
            img_name = os.path.join(self.data_dir,
                                    self.partition_file_sub.iloc[idx, 0])
            image = io.imread(img_name)
            if self.transform:
                image = self.transform(image)
            return image 
        
  1. Next, you can create your train and test loaders. Change the IMAGE_PATH to your directory which contains images.
batch_size = celeba_config['batch_size']

transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

IMAGE_PATH = '../input/celeba-dataset/img_align_celeba/img_align_celeba'


trainset = CelebDataset(data_dir=IMAGE_PATH, 
                        partition_file_path='../input/celeba-dataset/list_eval_partition.csv',
                        split=[0,1],
                        transform=transform)
trainloader = DataLoader(trainset, batch_size=batch_size,
                                          shuffle=True, num_workers=2)

testset = CelebDataset(data_dir=IMAGE_PATH, 
                        partition_file_path='../input/celeba-dataset/list_eval_partition.csv',
                        split=[2],
                        transform=transform)
testloader = DataLoader(testset, batch_size=batch_size,
                                          shuffle=True, num_workers=2)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Ekansh