'Having Issues Loading the CelebA dataset on Google Colab Using Pytorch
I need to load the CelebA dataset for a Python (Pytorch) implementation of the following paper: https://arxiv.org/pdf/1908.10578.pdf The original code for loading the CelebA dataset was written in MATLAB using MatConvNet with autonn (source 15 paper). I have the source code but I'm not sure if I can share it.
It's my first time using Pytorch(version 1.9.0+cu102) and doing a paper implementation in Computer Vision.
I looked at the following relevant question: How do I load the CelebA dataset on Google Colab, using torch vision, without running out of memory?
and tested out the solution suggested by user anurag: https://stackoverflow.com/a/65528710/15087536
Unfortunately, I'm still getting a syntax error.
Here's the code below:
import torchvision
from torchvision.datasets import ImageFolder
from torch.utils.data import DataLoader
from torchvision import transforms
# Root directory for the dataset
data_root = 'data/celeba'
# Spatial size of training images, images are resized to this size.
image_size = 64
# batch size
batch_size = 50000
transform=transforms.Compose([transforms.Resize(image_size),
transforms.CenterCrop(image_size),transforms.ToTensor(),transforms.Normalize(mean=
[0.5, 0.5, 0.5],std=[0.5, 0.5, 0.5])
dataset = ImageFolder(data_root,transform) **syntax error**
Solution 1:[1]
Since we do not know the syntax error in your case, I cannot comment on it.
Below I will share one possible way to do it.
You can download the celebA dataset from Kaggle using this link. Alternatively, you can also create a Kaggle kernel using this data (no need to download data then)
If you are using google colab, upload this data accessible from your notebook.
Next you can write a PyTorch dataset which will load the images based on the partition (train, valid, test).
I am pasting an example below. You can always customize this to suit your needs.
from torch.utils.data import Dataset, DataLoader
import pandas as pd
from skimage import io
class CelebDataset(Dataset):
def __init__(self,data_dir,partition_file_path,split,transform):
self.partition_file = pd.read_csv(partition_file_path)
self.data_dir = data_dir
self.split = split
self.transform = transform
def __len__(self):
self.partition_file_sub = self.partition_file[self.partition_file["partition"].isin(self.split)]
return len(self.partition_file_sub)
def __getitem__(self,idx):
img_name = os.path.join(self.data_dir,
self.partition_file_sub.iloc[idx, 0])
image = io.imread(img_name)
if self.transform:
image = self.transform(image)
return image
- Next, you can create your train and test loaders. Change the IMAGE_PATH to your directory which contains images.
batch_size = celeba_config['batch_size']
transform = transforms.Compose(
[transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
IMAGE_PATH = '../input/celeba-dataset/img_align_celeba/img_align_celeba'
trainset = CelebDataset(data_dir=IMAGE_PATH,
partition_file_path='../input/celeba-dataset/list_eval_partition.csv',
split=[0,1],
transform=transform)
trainloader = DataLoader(trainset, batch_size=batch_size,
shuffle=True, num_workers=2)
testset = CelebDataset(data_dir=IMAGE_PATH,
partition_file_path='../input/celeba-dataset/list_eval_partition.csv',
split=[2],
transform=transform)
testloader = DataLoader(testset, batch_size=batch_size,
shuffle=True, num_workers=2)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Ekansh |