'Only two distinct values of probabilities in Logistic regression output
I am running a logistic regression in R and extracting the predicted probabilities for a test data of about 15,000 rows using
predict(modelglm, test_data, type = "prob")
I was expecting to get various values of probabilities between 0 and 1 but instead there were only two distinct values of probabilities. All the probabilities were either 1 or 2.220446e-16 (which is practically equal to zero). Just in effect I am getting a binary classification instead of probabilities.
Why is this happening?
Solution 1:[1]
You do not provide a reproducible example, but I believe the type
parameter is wrong. Use "response" instead "prob".
predict(modelglm, test_data, type="response")
Solution 2:[2]
There's a possibility that the explanatory variables explain too much. F.ex if in every case that the dependent variable is true, B2 is 1, and for every case that the dependent variable is false, B2 is 0, then the model can get too good, and predicts only 0 or 1 (R usually gives 2.220446e-16 and 1 like you stated).
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | svess |