'sklearn train test split by year
I have a dataset that goes from 2016 to 2020 with a 'Year' column. I would like to use 2016-2017 as train data and 2018-2020 as test data. Is there any easy method to perform this data split?
Solution 1:[1]
You can use groupby function to group all the data in 2016 to 2017 as training data and group data from the year 2018-2020 as test data. Alternatively you can use the following code as well
df_train = df[df['year'].isin(2016,2017)] and df_test = df[df['year'].isin(2018,2019,2020)]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | desertnaut |