'Error : PerfectSeparationError: Perfect separation detected, results not available

This is the head of a train data set.

Head of the X_Train

Running the below code:

logit = sm.GLM(Y_train, X_train, family=sm.families.Binomial())
result = logit.fit()

Can you please help?

Getting the below error : Error Screen Shot



Solution 1:[1]

Python has detected a complete or quasi-complete separation in one or more of your predictors and the outcome variable.

This happens when all or nearly all of the values in one of the predictor categories (or a combination of predictors) are associated with only one of the binary outcome values. (I'm assuming you're attempting a logistic regression.) When this happens a solution cannot be found for the predictor coefficient.

There are several possible solutions. Depending on how many variables are in your analysis, you can try running two-way crosstabs on your outcome and each of the predictor variables to locate any cells with zero observations, and then drop that variable from the analysis or use fewer categories. Another option is to run a Firth logistic regression or a penalized regression.

Solution 2:[2]

In logistic regression, whenever perfect separation error occurs 1.Find correlation of target variable and predictors, build the heat map 2.try to understand their collinearity w.r.t. target variable, the predictor which is having lowest collinearity, drop that column from data frame 3.build the model.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 RobertF
Solution 2 Radhakrishna Naik