'"Input contains NaN, infinity or a value too large for dtype('float64')"
I am trying to train a model, but I am getting this error
Input contains NaN, infinity or a value too large for dtype('float64').
Here's part of my code, how can I fix this?
from sklearn.model_selection import train_test_split
a = clean_df.drop('AQI_calculated', axis = 1).values
b = clean_df.loc[:, 'AQI_calculated'].values
a_train, a_test, b_train, b_test = train_test_split(a, b, test_size = 0.3, random_state = 42)
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(a_train, b_train)
Solution 1:[1]
You have to check if in your data you have NaN
values basically. A model can't be trained if there are some NaN, infinity or a value to large (as the error says).
To check I reccomend you using this code:
df.isnull().any().any() #This code tells you if you have some NaN value in you dataframe
If you want to know in which column these NaN
values are, you can do it this way:
df.isnull().any()
Once you know where NaN
values are, you should have to deal with them. You can simple remove, fill or replace as @kelvt suggest in the comment.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Alex Serra Marrugat |