'Hosmer-Lemeshow statistic in R
I have run the Hosmer Lemeshow statistic in R, but I have obtained an p-value of 1. This seems strange to me. I know that a high p-valvalue means that we do not reject the null hypothesis that observed and expected are the same, but is it possible i have an error somewhere?
How do i interpret such p-value?
Below is the code i have used to run the test. I also attach how my model looks like. Response variable is a count variable, while all regressors are continous. I have run a negative binomial model, due to detected overdispersion in my initial poisson model.
> hosmerlem <- function(y, yhat, g=10)
+ {cutyhat <- cut(yhat, breaks = quantile(yhat, probs=seq(0,1, 1/g)), include.lowest=TRUE)
+ obs <- xtabs(cbind(1 - y, y) ~ cutyhat)
+ expect <- xtabs(cbind(1 - yhat, yhat) ~ cutyhat)
+ chisq <- sum((obs - expect)^2/expect)
+ P <- 1 - pchisq(chisq, g - 2)
+ return(list(chisq=chisq,p.value=P))}
> hosmerlem(y=TOT.N, yhat=fitted(final.model))
$chisq
[1] -2.529054
$p.value
[1] 1
> final.model <-glm.nb(TOT.N ~ D.PARK + OPEN.L + L.WAT.C + sqrt(L.P.ROAD))
> summary(final.model)
Call:
glm.nb(formula = TOT.N ~ D.PARK + OPEN.L + L.WAT.C + sqrt(L.P.ROAD),
init.theta = 4.979895131, link = log)
Deviance Residuals:
Min 1Q Median 3Q Max
-3.08218 -0.70494 -0.09268 0.55575 1.67860
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 4.032e+00 3.363e-01 11.989 < 2e-16 ***
D.PARK -1.154e-04 1.061e-05 -10.878 < 2e-16 ***
OPEN.L -1.085e-02 3.122e-03 -3.475 0.00051 ***
L.WAT.C 1.597e-01 7.852e-02 2.034 0.04195 *
sqrt(L.P.ROAD) 4.924e-01 3.101e-01 1.588 0.11231
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for Negative Binomial(4.9799) family taken to be 1)
Null deviance: 197.574 on 51 degrees of freedom
Residual deviance: 51.329 on 47 degrees of freedom
AIC: 383.54
Number of Fisher Scoring iterations: 1
Theta: 4.98
Std. Err.: 1.22
2 x log-likelihood: -371.542
Solution 1:[1]
As correctly pointed out by @BenBolker, Hosmer-Lemeshow is a test for logistic regression, not for a negative binomial generalized linear model.
If we consider to apply the test to a logistic regression,
the inputs of the function hosmerlem
(a copy of the hoslem.test
function in the package ResourceSelection
) should be:
- y
= a numeric vector of observations, binary (0/1)
- yhat
= expected values (probabilities)
Here is an illustrative example that shows how to get the correct inputs:
set.seed(123)
n <- 500
x <- rnorm(n)
y <- rbinom(n, 1, plogis(0.1 + 0.5*x))
logmod <- glm(y ~ x, family=binomial)
# Important: use the type="response" option
yhat <- predict(logmod, type="response")
hosmerlem(y, yhat)
########
$chisq
[1] 4.522719
$p.value
[1] 0.8071559
The same result is given by the function hoslem.test
:
library(ResourceSelection)
hoslem.test(y, yhat)
########
Hosmer and Lemeshow goodness of fit (GOF) test
data: y, yhat
X-squared = 4.5227, df = 8, p-value = 0.8072
Solution 2:[2]
As already mentioned, HL-test is not appropriate for the specified model. It is also important to know that a large p-value doesn't necessarily mean a good fit. It could also be that there isn't enough evidence to prove it's a poor fit.
Meanwhile, the gofcat package implementation of the HL-test provides for passing model objects directly to the function without necessarily supplying the observed and predicted values. For the simulated data one has:
library(gofcat)
set.seed(123)
n <- 500
x <- rnorm(n)
y <- rbinom(n, 1, plogis(0.1 + 0.5*x))
logmod <- glm(y ~ x, family=binomial)
hosmerlem(logmod, group = 10)
Hosmer-Lemeshow Test:
Chi-sq df pr(>chi)
binary(Hosmerlem) 4.5227 8 0.8072
H0: No lack of fit dictated
rho: 100%
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 |