'PyTorch distributed dataLoader
Any recommended ways to make PyTorch DataLoader (torch.utils.data.DataLoader
) work in distributed environment, single machine and multiple machines? Can it be done without DistributedDataParallel
?
Solution 1:[1]
Maybe you need to make your question clear. DistributedDataParallel
is abbreviated as DDP
, you need to train a model with DDP
in a distributed environment. This question seems to ask how to arrange the dataset loading process for distributed training.
First of all,
data.Dataloader
is proper for both dist and non-dist training, usually, there is no need to do something on that.
But the sampling strategy varies in this two modes, you need to specify a sampler for the dataloader(the sampler
arg in data.Dataloader
), adopting torch.utils.data.distributed.DistributedSampler
is the simplest way.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Florin |