'Basic KNN code returning input contains NaN despite using df.replace

This code was working before, but has randomly (as far as I can tell atleast) stopped working. I am running the code on jupyter-lab, and am following sentdex's Machine Learning with Python series (the current video is pt14).

I am using train_test_split instead of cross_validation like sentdex as it has since been deprecated.

I am using df.replace to replace '?' values in the dataset (wisconsin breast cancer from UCI repository) with '-99999'. I have added in keywords into df.replace() parameters to make sure it's not caused by the pandas futurewarning warning.

However I still get this error:

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

Code:

import numpy as np
import pandas as pd
from sklearn import preprocessing, neighbors
from sklearn.model_selection import train_test_split

df=pd.read_csv('breast-cancer-wisconsin.data')
df.replace(to_replace='?', value=-99999, inplace=True)
df.drop(['id'], axis =1, inplace=True)

X=np.array(df.drop(['class'],1))
y=np.array(df['class'])
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2)

clf=neighbors.KNeighborsClassifier()

clf.fit(X_train,y_train)
pred=clf.predict(X_test)
accuracy=clf.score(X_test,y_test)

print(accuracy)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Basic KNN code returning input contains NaN despite using df.replace

Sources

Related Questions