'Issue fitting a SGD Classifier
I'm following the book Hands-on Machichine Learning by Aurelien Geron, more specifically, where it begins to go into classifiers. I'm following the code from the book, but the error that I'm getting is:
ValueError: The number of classes has to be greater than one; got 1 class
My Code:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import SGDClassifier
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = .20, random_state = 42)
y_train_5 = (y_train == 5)
y_test_5 = (y_test == 5)
sgd_clf = SGDClassifier(random_state=42)
sgd_clf.fit(X_train, y_train_5)
When I looked up the error online, a potential fix was to use np.unique(y_train_5)
, but this did not work either.
Solution 1:[1]
The problem is that you passed y_train_5 such that every value is the same, if you do
print(set(y_train_5))
you will see just one value. Consider doing stratified train test split, which makes sure that each class ends up in both train and test. Alternatively your y_train did not contain "5"s at all, and all values both in y_train_5 and y_test_5 are "False".
Solution 2:[2]
Your target vectors are comparing the labels to integer 5's, but the labels are in strings. Hence, all values of y_train_5
evaluate to False
and .fit()
returns the "one class" error.
Change the two vector definitions to:
y_train_5 = (y_train == '5')
y_test_5 = (y_test == '5')
and the classifier .fit()
method will run without error.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | lejlot |
Solution 2 | Pete K |