'Issue fitting a SGD Classifier

I'm following the book Hands-on Machichine Learning by Aurelien Geron, more specifically, where it begins to go into classifiers. I'm following the code from the book, but the error that I'm getting is:

ValueError: The number of classes has to be greater than one; got 1 class

My Code:

from sklearn.model_selection import train_test_split
from sklearn.linear_model import SGDClassifier
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = .20, random_state = 42)

y_train_5 = (y_train == 5)
y_test_5 = (y_test == 5)

sgd_clf = SGDClassifier(random_state=42)
sgd_clf.fit(X_train, y_train_5)

When I looked up the error online, a potential fix was to use np.unique(y_train_5), but this did not work either.



Solution 1:[1]

The problem is that you passed y_train_5 such that every value is the same, if you do

print(set(y_train_5))

you will see just one value. Consider doing stratified train test split, which makes sure that each class ends up in both train and test. Alternatively your y_train did not contain "5"s at all, and all values both in y_train_5 and y_test_5 are "False".

Solution 2:[2]

Your target vectors are comparing the labels to integer 5's, but the labels are in strings. Hence, all values of y_train_5 evaluate to False and .fit() returns the "one class" error.

Change the two vector definitions to:

y_train_5 = (y_train == '5')
y_test_5 = (y_test == '5')

and the classifier .fit() method will run without error.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 lejlot
Solution 2 Pete K