'PyTorch distributed dataLoader

Any recommended ways to make PyTorch DataLoader (torch.utils.data.DataLoader) work in distributed environment, single machine and multiple machines? Can it be done without DistributedDataParallel?



Solution 1:[1]

Maybe you need to make your question clear. DistributedDataParallel is abbreviated as DDP, you need to train a model with DDP in a distributed environment. This question seems to ask how to arrange the dataset loading process for distributed training.

First of all,

data.Dataloader is proper for both dist and non-dist training, usually, there is no need to do something on that.

But the sampling strategy varies in this two modes, you need to specify a sampler for the dataloader(the sampler arg in data.Dataloader), adopting torch.utils.data.distributed.DistributedSampler is the simplest way.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Florin