'sklearn train test split by year

I have a dataset that goes from 2016 to 2020 with a 'Year' column. I would like to use 2016-2017 as train data and 2018-2020 as test data. Is there any easy method to perform this data split?



Solution 1:[1]

You can use groupby function to group all the data in 2016 to 2017 as training data and group data from the year 2018-2020 as test data. Alternatively you can use the following code as well

df_train = df[df['year'].isin(2016,2017)] and df_test = df[df['year'].isin(2018,2019,2020)]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 desertnaut