'Reproducibility issue with PyTorch

I'm running a script with the same seed and I see results are reproduced on consecutive runs but somehow running the same script with the same seed changes the output after a few days. I'm only getting a short-term reproducibility which is weird. For reproducibility my script includes the following statements already:

torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = True
torch.use_deterministic_algorithms(True)

random.seed(args.seed)
np.random.seed(args.seed)
torch.manual_seed(args.seed)

I also checked the sequence of instance ids created by the RandomSampler for train Dataloader which is maintained across runs. Also set the num_workers=0 in the dataloader.What could be causing the output to change?



Solution 1:[1]

PyTorch is actually not fully deterministic. Meaning that with a set seed, some PyTorch operations will simply behave differently and diverge from previous runs, given enough time. This is due to algorithm, CUDA, and backprop optimizations.

This is a good read: https://pytorch.org/docs/stable/notes/randomness.html

The above page lists which operations are non-deterministic. It is generally discouraged that you disable their use, but it can be done with:

torch.use_deterministic_algorithms()

This might also disable which operation can be used.

Solution 2:[2]

torch.cuda.manual_seed(args.seed)
torch.cuda.manual_seed_all(args.seed)

Try adding these to your current reproducibility setting.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Zoom
Solution 2 starriet