'Invalid labels for classification logistic regression model in pyspark databricks
I am using Spark ML library for classification problem using a logistic regression.
I have vectorized input features and created training dataset and test dataset.
While fitting the model I get invalid labels issue.
the training dataset is :
where my input features as Independent_features
and my target feature as Category_con
.
Solution 1:[1]
Use the words : label, features instead of independent_features and Category_con while creating your vectors.
Solution 2:[2]
For the labels, you would need to change them into just 3 categories. It looks like you might have 6 from the error message. You would need to use conditional replacement to group or bin the categories like below:
train_df.withColumn('label', when((col('Category_con') == firstCondition) ).otherwise(when((col('Category_con') == secondCondition) ).otherwise(lastCondition))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | MichiganMadeLearner |