'Predicted values for conditional logistic regression greater than 1

I have a multivariate conditional logistic regression model. Case and controls are matched on a 1 to many basis.

I want to make predictions using the model. However, the predicted values I keep getting are between 0 and 3 when they should be binary (0 or 1). Why don't I get binary values?

This is my data: survival1 is binary. IC is also binary. Test_intensity-cat is categorical with 4 levels. Herd_size_cat is also categorical with 4 levels. Group has groups in the original data set. I've just included 11 here. The subset of data doesn't converge but the original set does.

n   survival1   IC   test_intensity_cat herd_size_cat group
1          0 none  1.2 < test/yr <= 1.5        medium   628
2          0 none       <= 1.2 test/yr        medium   629
3          0 none        >= 2 tests/yr    very large   627
4          1   IC        >= 2 tests/yr    very large   628
5          0 none       <= 1.2 test/yr         large   627
6          1   IC        >= 2 tests/yr    very large   627
7          1 none        >= 2 tests/yr    very large   627
8          0 none   1.5 < test/yr <= 2         large   629
9          0   IC   1.5 < test/yr <= 2    very large   629
10         0 none   1.5 < test/yr <= 2         large   628
11         0 none       <= 1.2 test/yr         large   628
12         0 none   1.5 < test/yr <= 2         small   231
13         0 none   1.5 < test/yr <= 2    very large   231
14         0 none 1.2 < test/yr <= 1.5    very large   231
15         0   IC   1.5 < test/yr <= 2    very large   231
16         1 none        >= 2 tests/yr    very large   170
17         0 none 1.2 < test/yr <= 1.5    very large   170
18         0 none        >= 2 tests/yr    very large   170
19         1 none        >= 2 tests/yr        medium   582
20         0 none   1.5 < test/yr <= 2         small   583
21         0   IC   1.5 < test/yr <= 2         large   582
22         1 none        >= 2 tests/yr         large   583
23         0 none 1.2 < test/yr <= 1.5    very large   134
24         0 none 1.2 < test/yr <= 1.5    very large   134
25         0 none       <= 1.2 test/yr         small   134
26         0   IC   1.5 < test/yr <= 2    very large   134
27         0 none 1.2 < test/yr <= 1.5    very large   484
28         0 none        >= 2 tests/yr    very large   485
29         0   IC   1.5 < test/yr <= 2        medium   484
30         0 none   1.5 < test/yr <= 2         large   485
31         0 none   1.5 < test/yr <= 2         small   484
32         0   IC       <= 1.2 test/yr    very large   485
33         0 none 1.2 < test/yr <= 1.5    very large   484
34         0 none   1.5 < test/yr <= 2    very large   485
35         0 none       <= 1.2 test/yr        medium    76
36         0 none        >= 2 tests/yr    very large    76
37         0 none        >= 2 tests/yr         large    76
38         0   IC        >= 2 tests/yr        medium    76
39         0 none       <= 1.2 test/yr    very large   629
40         0 none   1.5 < test/yr <= 2        medium   582
41         0   IC        >= 2 tests/yr         large   170
42         1   IC 1.2 < test/yr <= 1.5         small   583
43         0 none   1.5 < test/yr <= 2         small   582
44         0 none       <= 1.2 test/yr         small   583

This is my code is in R studio:

library(tidyverse)
library(broom)
library(survival)

model_IC_intensity_size <-
clogit(survival1 ~ IC + test_intensity_cat + herd_size_cat + strata(group),
method = "exact",
data = LCT_herd_matched)

actual <- LCT_herd_matched$survival1
predicted <- round(predict(model_IC_intensity_size, type = "expected"))
table(predicted, actual)

This is the output with the original dataset. The subset gives a smaller version that includes an aberrant 2.

           actual
predicted    0    1
        0 9271  641
        1  185  434
        2    6   42
        3    0    2

I'm also want to calculate leverage, delta chi squared and the delta beta statistics (p 425 Veterinary Epidemiologic Research by Dohoo et al). How would I go about determining these diagnostics for a conditional logistic regression model in R?

dput(mini)
structure(list(survival1 = c(0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 
0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0), IC = structure(c(1L, 1L, 
1L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 
1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 
1L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L), .Label = c("none", "IC"
), class = "factor"), test_intensity_cat = structure(c(3L, 1L, 
2L, 2L, 1L, 2L, 2L, 4L, 4L, 4L, 1L, 4L, 4L, 3L, 4L, 2L, 3L, 2L, 
2L, 4L, 4L, 2L, 3L, 3L, 1L, 4L, 3L, 2L, 4L, 4L, 4L, 1L, 3L, 4L, 
1L, 2L, 2L, 2L, 1L, 4L, 2L, 3L, 4L, 1L), .Label = c("<= 1.2 test/yr", 
">= 2 tests/yr", "1.2 < test/yr <= 1.5", "1.5 < test/yr <= 2"
), class = "factor"), herd_size_cat = structure(c(2L, 2L, 4L, 
4L, 3L, 4L, 4L, 3L, 4L, 3L, 3L, 1L, 4L, 4L, 4L, 4L, 4L, 4L, 2L, 
1L, 3L, 3L, 4L, 4L, 1L, 4L, 4L, 4L, 2L, 3L, 1L, 4L, 4L, 4L, 2L, 
4L, 3L, 2L, 4L, 2L, 3L, 1L, 1L, 1L), .Label = c("small", "medium", 
"large", "very large"), class = "factor"), group = structure(c(627L, 
628L, 626L, 627L, 626L, 626L, 626L, 628L, 628L, 627L, 627L, 231L, 
231L, 231L, 231L, 170L, 170L, 170L, 581L, 582L, 581L, 582L, 134L, 
134L, 134L, 134L, 483L, 484L, 483L, 484L, 483L, 484L, 483L, 484L, 
76L, 76L, 76L, 76L, 628L, 581L, 170L, 582L, 581L, 582L), .Label = 
c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
"14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", 
"25", "26", "27", "28", "29", "30", "31", "32", "33", "34", "35", 
"36", "37", "38", "39", "40", "41", "42", "43", "44", "45", "46", 
"47", "48", "49", "50", "51", "52", "53", "54", "55", "56", "57", 
"58", "59", "60", "61", "62", "63", "64", "65", "66", "67", "68", 
"69", "70", "71", "72", "73", "74", "75", "76", "77", "78", "79", 
"80", "81", "82", "83", "84", "85", "86", "87", "88", "89", "90", 
"91", "92", "93", "94", "95", "96", "97", "98", "99", "100", 
"101", "102", "103", "104", "105", "106", "107", "108", "109", 
"110", "111", "112", "113", "114", "115", "116", "117", "118", 
"119", "120", "121", "122", "123", "124", "125", "126", "127", 
"128", "129", "130", "131", "132", "133", "134", "135", "136", 
"137", "138", "139", "140", "141", "142", "143", "144", "145", 
"146", "147", "148", "149", "150", "151", "152", "153", "154", 
"155", "156", "157", "158", "159", "160", "161", "162", "163", 
"164", "165", "166", "167", "168", "169", "170", "171", "172", 
"173", "174", "175", "176", "177", "178", "179", "180", "181", 
"182", "183", "184", "185", "186", "187", "188", "189", "190", 
"191", "192", "193", "194", "195", "196", "197", "198", "199", 
"200", "201", "202", "203", "204", "205", "206", "207", "208", 
"209", "210", "211", "212", "213", "214", "215", "216", "217", 
"218", "219", "220", "221", "222", "223", "224", "225", "226", 
"227", "228", "229", "230", "231", "232", "233", "234", "235", 
"236", "237", "238", "239", "240", "241", "242", "243", "244", 
"245", "246", "247", "248", "249", "250", "251", "252", "253", 
"254", "255", "256", "257", "258", "259", "260", "261", "262", 
"263", "264", "265", "266", "267", "268", "269", "270", "271", 
"272", "273", "274", "275", "276", "277", "278", "280", "281", 
"282", "283", "284", "285", "286", "287", "288", "289", "290", 
"291", "292", "293", "294", "295", "296", "297", "298", "299", 
"300", "301", "302", "303", "304", "305", "306", "307", "308", 
"309", "310", "311", "312", "313", "314", "315", "316", "317", 
"318", "319", "320", "321", "322", "323", "324", "325", "326", 
"327", "328", "329", "330", "331", "332", "333", "334", "335", 
"336", "337", "338", "339", "340", "341", "342", "343", "344", 
"345", "346", "347", "348", "349", "350", "351", "352", "353", 
"354", "355", "356", "357", "358", "359", "360", "361", "362", 
"363", "364", "365", "366", "367", "368", "369", "370", "371", 
"372", "373", "374", "375", "376", "377", "378", "379", "380", 
"381", "382", "383", "384", "385", "386", "387", "388", "389", 
"390", "391", "392", "393", "394", "395", "396", "397", "398", 
"399", "400", "401", "402", "403", "404", "405", "406", "407", 
"408", "409", "410", "411", "412", "413", "414", "415", "416", 
"417", "418", "419", "420", "421", "422", "423", "424", "425", 
"426", "427", "428", "429", "430", "431", "432", "433", "434", 
"435", "436", "437", "438", "439", "440", "441", "442", "443", 
"444", "445", "446", "447", "448", "449", "450", "451", "452", 
"453", "454", "455", "456", "457", "458", "459", "460", "461", 
"462", "463", "464", "465", "466", "467", "468", "469", "470", 
"471", "472", "473", "474", "475", "476", "477", "478", "479", 
"480", "481", "482", "483", "484", "485", "486", "487", "488", 
"489", "490", "491", "492", "493", "494", "495", "496", "497", 
"498", "499", "500", "501", "502", "503", "504", "505", "506", 
"507", "508", "509", "510", "511", "512", "513", "514", "515", 
"516", "517", "518", "519", "520", "521", "522", "523", "524", 
"525", "526", "527", "528", "529", "530", "531", "532", "533", 
"534", "535", "536", "537", "538", "539", "540", "541", "542", 
"543", "544", "545", "546", "547", "548", "549", "550", "551", 
"552", "553", "554", "555", "556", "557", "558", "559", "560", 
"561", "562", "563", "564", "565", "566", "567", "568", "569", 
"570", "571", "572", "573", "574", "575", "576", "577", "578", 
"579", "580", "581", "582", "583", "584", "585", "586", "587", 
"588", "589", "590", "591", "592", "593", "594", "595", "596", 
"597", "598", "599", "600", "601", "602", "603", "604", "605", 
"606", "607", "608", "609", "610", "611", "612", "613", "614", 
"615", "616", "617", "618", "619", "620", "621", "622", "623", 
"624", "625", "626", "627", "628", "629", "630", "631", "632", 
"633", "635", "636", "637", "638", "639", "640", "641", "642", 
"643", "644", "645", "646", "647", "648", "649", "650", "651", 
"652", "653", "654", "655", "656", "657", "658", "659", "660", 
"661", "662", "663", "664", "665", "666", "667", "668", "669", 
"670", "671", "672", "673", "674", "675", "676", "677", "678", 
"679", "680", "681", "682", "683", "684", "685", "686", "687", 
"688", "689", "690", "691", "692", "693", "694", "695", "696", 
"697", "698", "699", "700", "701", "702", "703", "704", "705", 
"706", "707", "708", "709", "710", "711", "712", "713", "714", 
"715", "716", "717", "718", "719", "720", "721", "722", "723", 
"724", "725", "726", "727", "728", "729", "730", "731", "732", 
"733", "734", "745", "746", "747", "748", "749", "750", "751", 
"752", "753", "754", "755", "756", "757", "758", "759", "760", 
"761", "762", "763", "764", "765", "766", "767", "986", "987"
), class = "factor")), row.names = c(NA, -44L), class = "data.frame")


Solution 1:[1]

I work in a different field but encountered similar issues when making predictions from clogit models. It appears there's different schools of thought on whether making predictions is even valid from these models (see comments in this thread How to get fitted values from clogit model). Nevertheless, we attempted Terry Therneu's workaround presented in the above link. Although I can't reproduce you're data, below is a hypothetical and generic example of what we did. Essentially, we averaged predictions across strata. Would be interested in commentary from experts of which I am not one.

library(tidyverse)
library(ggplot2)
library(survival)

dat <- data.frame(event = c(0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1),
                  strata = factor(c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4,5,5,5,5,5)),
                  cont_var1 = c(100, 90, 10, 20, 60, 95, 85, 15, 25, 65, 90, 80,
                                20, 30, 70, 100, 90, 10, 20, 60, 90, 80, 20, 30, 70),
                  cat_var = factor(c("a","a","a","a","a","a","a","a","a","a","b","b","b","b",
                                     "b","b","b","b","b","b","a","a","a","a","a")))

m1 <- clogit(event ~ cont_var1 + cat_var + strata(strata), data = dat, iter.max = 100)

# new dataframe 
newdat <- expand_grid(cont_var1 = seq(20, 100, 1), 
                      cat_var = factor(c("a"),
                                       levels = c("a", "b")),
                      strata = strata(unique(dat$strata)))
# predict
pred_dat <- newdat %>%
  bind_cols(predict(m1,
                    newdata = newdat,
                    type = "risk",
                    se.fit = TRUE))

# plot (note first step calculating probability from predictions)
pred_dat %>%
  mutate(prob = fit / (1 + fit),
         low = (fit - 1.96 * se.fit) / (1 + (fit - 1.96 * se.fit)),
         up = (fit + 1.96 * se.fit) / (1 + (fit + 1.96 * se.fit)),
         low = ifelse(low < 0, 0, low),
         up = ifelse(up > 1, 1, up)) %>%
  group_by(cont_var1) %>%
  summarise(prob = mean(prob), low = mean(low), up = mean(up)) %>%
  ggplot()+
  geom_line(aes(x = cont_var1, y = prob)) +
  geom_ribbon(aes(x = cont_var1, y = prob, ymin = low, ymax = up), alpha = 0.1) +
  theme_classic(base_size = 16) + 
  labs(y = "Probability (level 'a')", x = "Continuous Variable")

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 DharmanBot