'sklearn.model_selection.train_test_split random state

I am training a computer vision model. I divide the images in 3 datasets: training, validation and testing.

So that I get always the same images in training, vaidation and testing, I use the random_state parameter of train_test_split function.

However, I have a problem:

I am training and testing on two different computers (linux and windows). I thought that the results for a given random state would be same but they aren't.

Is there a way that I get the same results on both computers ?

I can't divide the images in 3 folders (training, validation and testing) since I want to change the test size and validation size during different experiments.



Solution 1:[1]

On a practical note, training of the models may require the usage of a distant computer or server (e.g. Microsoft Azur, Google collaboratory etc.) and it is important to be aware that random seeds vary between different python versions and operating systems.

Thus, when dividing the original dataset into training, validation and testing datasets, the usage of spliting functions with random seeds is prohibited as it could lead to overlapping testing and training datasets. A way to avoid this is by keeping separate .csv files with the images to be used for training, validation, or testing.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Nicolas