'Hosmer-Lemeshow statistic in R

I have run the Hosmer Lemeshow statistic in R, but I have obtained an p-value of 1. This seems strange to me. I know that a high p-valvalue means that we do not reject the null hypothesis that observed and expected are the same, but is it possible i have an error somewhere?

How do i interpret such p-value?

Below is the code i have used to run the test. I also attach how my model looks like. Response variable is a count variable, while all regressors are continous. I have run a negative binomial model, due to detected overdispersion in my initial poisson model.

> hosmerlem <- function(y, yhat, g=10)
+     {cutyhat <- cut(yhat, breaks = quantile(yhat, probs=seq(0,1, 1/g)), include.lowest=TRUE)
+     obs <- xtabs(cbind(1 - y, y) ~ cutyhat)
+     expect <- xtabs(cbind(1 - yhat, yhat) ~ cutyhat)
+     chisq <- sum((obs - expect)^2/expect)
+     P <- 1 - pchisq(chisq, g - 2)
+     return(list(chisq=chisq,p.value=P))}
>   hosmerlem(y=TOT.N, yhat=fitted(final.model))
$chisq
[1] -2.529054

$p.value
[1] 1

> final.model <-glm.nb(TOT.N ~ D.PARK + OPEN.L + L.WAT.C + sqrt(L.P.ROAD))
> summary(final.model)

Call:
glm.nb(formula = TOT.N ~ D.PARK + OPEN.L + L.WAT.C + sqrt(L.P.ROAD), 
    init.theta = 4.979895131, link = log)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-3.08218  -0.70494  -0.09268   0.55575   1.67860  

Coefficients:
                 Estimate Std. Error z value Pr(>|z|)    
(Intercept)     4.032e+00  3.363e-01  11.989  < 2e-16 ***
D.PARK         -1.154e-04  1.061e-05 -10.878  < 2e-16 ***
OPEN.L         -1.085e-02  3.122e-03  -3.475  0.00051 ***
L.WAT.C         1.597e-01  7.852e-02   2.034  0.04195 *  
sqrt(L.P.ROAD)  4.924e-01  3.101e-01   1.588  0.11231    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for Negative Binomial(4.9799) family taken to be 1)

    Null deviance: 197.574  on 51  degrees of freedom
Residual deviance:  51.329  on 47  degrees of freedom
AIC: 383.54

Number of Fisher Scoring iterations: 1


              Theta:  4.98 
          Std. Err.:  1.22 

 2 x log-likelihood:  -371.542

r poisson p-value goodness-of-fit

Solution 1:^[1]

As correctly pointed out by @BenBolker, Hosmer-Lemeshow is a test for logistic regression, not for a negative binomial generalized linear model.

If we consider to apply the test to a logistic regression, the inputs of the function hosmerlem (a copy of the hoslem.test function in the package ResourceSelection) should be:
- y = a numeric vector of observations, binary (0/1)
- yhat = expected values (probabilities)

Here is an illustrative example that shows how to get the correct inputs:

set.seed(123)
n <- 500
x <- rnorm(n)
y <- rbinom(n, 1, plogis(0.1 + 0.5*x))
logmod <- glm(y ~ x, family=binomial)

# Important: use the type="response" option
yhat <- predict(logmod, type="response")
hosmerlem(y, yhat)

########
$chisq
[1] 4.522719

$p.value
[1] 0.8071559

The same result is given by the function hoslem.test:

library(ResourceSelection)
hoslem.test(y, yhat)

########
Hosmer and Lemeshow goodness of fit (GOF) test

data:  y, yhat
X-squared = 4.5227, df = 8, p-value = 0.8072

Solution 2:^[2]

As already mentioned, HL-test is not appropriate for the specified model. It is also important to know that a large p-value doesn't necessarily mean a good fit. It could also be that there isn't enough evidence to prove it's a poor fit.

Meanwhile, the gofcat package implementation of the HL-test provides for passing model objects directly to the function without necessarily supplying the observed and predicted values. For the simulated data one has:

library(gofcat)

set.seed(123)
n <- 500
x <- rnorm(n)
y <- rbinom(n, 1, plogis(0.1 + 0.5*x))
logmod <- glm(y ~ x, family=binomial)

hosmerlem(logmod, group = 10)

Hosmer-Lemeshow Test:
  
                   Chi-sq  df  pr(>chi)
binary(Hosmerlem)  4.5227   8    0.8072

H0: No lack of fit dictated

rho: 100%

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1
Solution 2

'Hosmer-Lemeshow statistic in R

Solution 1:[1]

Solution 2:[2]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]