'Predicted values for conditional logistic regression greater than 1
I have a multivariate conditional logistic regression model. Case and controls are matched on a 1 to many basis.
I want to make predictions using the model. However, the predicted values I keep getting are between 0 and 3 when they should be binary (0 or 1). Why don't I get binary values?
This is my data: survival1 is binary. IC is also binary. Test_intensity-cat is categorical with 4 levels. Herd_size_cat is also categorical with 4 levels. Group has groups in the original data set. I've just included 11 here. The subset of data doesn't converge but the original set does.
n survival1 IC test_intensity_cat herd_size_cat group
1 0 none 1.2 < test/yr <= 1.5 medium 628
2 0 none <= 1.2 test/yr medium 629
3 0 none >= 2 tests/yr very large 627
4 1 IC >= 2 tests/yr very large 628
5 0 none <= 1.2 test/yr large 627
6 1 IC >= 2 tests/yr very large 627
7 1 none >= 2 tests/yr very large 627
8 0 none 1.5 < test/yr <= 2 large 629
9 0 IC 1.5 < test/yr <= 2 very large 629
10 0 none 1.5 < test/yr <= 2 large 628
11 0 none <= 1.2 test/yr large 628
12 0 none 1.5 < test/yr <= 2 small 231
13 0 none 1.5 < test/yr <= 2 very large 231
14 0 none 1.2 < test/yr <= 1.5 very large 231
15 0 IC 1.5 < test/yr <= 2 very large 231
16 1 none >= 2 tests/yr very large 170
17 0 none 1.2 < test/yr <= 1.5 very large 170
18 0 none >= 2 tests/yr very large 170
19 1 none >= 2 tests/yr medium 582
20 0 none 1.5 < test/yr <= 2 small 583
21 0 IC 1.5 < test/yr <= 2 large 582
22 1 none >= 2 tests/yr large 583
23 0 none 1.2 < test/yr <= 1.5 very large 134
24 0 none 1.2 < test/yr <= 1.5 very large 134
25 0 none <= 1.2 test/yr small 134
26 0 IC 1.5 < test/yr <= 2 very large 134
27 0 none 1.2 < test/yr <= 1.5 very large 484
28 0 none >= 2 tests/yr very large 485
29 0 IC 1.5 < test/yr <= 2 medium 484
30 0 none 1.5 < test/yr <= 2 large 485
31 0 none 1.5 < test/yr <= 2 small 484
32 0 IC <= 1.2 test/yr very large 485
33 0 none 1.2 < test/yr <= 1.5 very large 484
34 0 none 1.5 < test/yr <= 2 very large 485
35 0 none <= 1.2 test/yr medium 76
36 0 none >= 2 tests/yr very large 76
37 0 none >= 2 tests/yr large 76
38 0 IC >= 2 tests/yr medium 76
39 0 none <= 1.2 test/yr very large 629
40 0 none 1.5 < test/yr <= 2 medium 582
41 0 IC >= 2 tests/yr large 170
42 1 IC 1.2 < test/yr <= 1.5 small 583
43 0 none 1.5 < test/yr <= 2 small 582
44 0 none <= 1.2 test/yr small 583
This is my code is in R studio:
library(tidyverse)
library(broom)
library(survival)
model_IC_intensity_size <-
clogit(survival1 ~ IC + test_intensity_cat + herd_size_cat + strata(group),
method = "exact",
data = LCT_herd_matched)
actual <- LCT_herd_matched$survival1
predicted <- round(predict(model_IC_intensity_size, type = "expected"))
table(predicted, actual)
This is the output with the original dataset. The subset gives a smaller version that includes an aberrant 2.
actual
predicted 0 1
0 9271 641
1 185 434
2 6 42
3 0 2
I'm also want to calculate leverage, delta chi squared and the delta beta statistics (p 425 Veterinary Epidemiologic Research by Dohoo et al). How would I go about determining these diagnostics for a conditional logistic regression model in R?
dput(mini)
structure(list(survival1 = c(0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0,
0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0), IC = structure(c(1L, 1L,
1L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L,
1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L,
1L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L), .Label = c("none", "IC"
), class = "factor"), test_intensity_cat = structure(c(3L, 1L,
2L, 2L, 1L, 2L, 2L, 4L, 4L, 4L, 1L, 4L, 4L, 3L, 4L, 2L, 3L, 2L,
2L, 4L, 4L, 2L, 3L, 3L, 1L, 4L, 3L, 2L, 4L, 4L, 4L, 1L, 3L, 4L,
1L, 2L, 2L, 2L, 1L, 4L, 2L, 3L, 4L, 1L), .Label = c("<= 1.2 test/yr",
">= 2 tests/yr", "1.2 < test/yr <= 1.5", "1.5 < test/yr <= 2"
), class = "factor"), herd_size_cat = structure(c(2L, 2L, 4L,
4L, 3L, 4L, 4L, 3L, 4L, 3L, 3L, 1L, 4L, 4L, 4L, 4L, 4L, 4L, 2L,
1L, 3L, 3L, 4L, 4L, 1L, 4L, 4L, 4L, 2L, 3L, 1L, 4L, 4L, 4L, 2L,
4L, 3L, 2L, 4L, 2L, 3L, 1L, 1L, 1L), .Label = c("small", "medium",
"large", "very large"), class = "factor"), group = structure(c(627L,
628L, 626L, 627L, 626L, 626L, 626L, 628L, 628L, 627L, 627L, 231L,
231L, 231L, 231L, 170L, 170L, 170L, 581L, 582L, 581L, 582L, 134L,
134L, 134L, 134L, 483L, 484L, 483L, 484L, 483L, 484L, 483L, 484L,
76L, 76L, 76L, 76L, 628L, 581L, 170L, 582L, 581L, 582L), .Label =
c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
"14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24",
"25", "26", "27", "28", "29", "30", "31", "32", "33", "34", "35",
"36", "37", "38", "39", "40", "41", "42", "43", "44", "45", "46",
"47", "48", "49", "50", "51", "52", "53", "54", "55", "56", "57",
"58", "59", "60", "61", "62", "63", "64", "65", "66", "67", "68",
"69", "70", "71", "72", "73", "74", "75", "76", "77", "78", "79",
"80", "81", "82", "83", "84", "85", "86", "87", "88", "89", "90",
"91", "92", "93", "94", "95", "96", "97", "98", "99", "100",
"101", "102", "103", "104", "105", "106", "107", "108", "109",
"110", "111", "112", "113", "114", "115", "116", "117", "118",
"119", "120", "121", "122", "123", "124", "125", "126", "127",
"128", "129", "130", "131", "132", "133", "134", "135", "136",
"137", "138", "139", "140", "141", "142", "143", "144", "145",
"146", "147", "148", "149", "150", "151", "152", "153", "154",
"155", "156", "157", "158", "159", "160", "161", "162", "163",
"164", "165", "166", "167", "168", "169", "170", "171", "172",
"173", "174", "175", "176", "177", "178", "179", "180", "181",
"182", "183", "184", "185", "186", "187", "188", "189", "190",
"191", "192", "193", "194", "195", "196", "197", "198", "199",
"200", "201", "202", "203", "204", "205", "206", "207", "208",
"209", "210", "211", "212", "213", "214", "215", "216", "217",
"218", "219", "220", "221", "222", "223", "224", "225", "226",
"227", "228", "229", "230", "231", "232", "233", "234", "235",
"236", "237", "238", "239", "240", "241", "242", "243", "244",
"245", "246", "247", "248", "249", "250", "251", "252", "253",
"254", "255", "256", "257", "258", "259", "260", "261", "262",
"263", "264", "265", "266", "267", "268", "269", "270", "271",
"272", "273", "274", "275", "276", "277", "278", "280", "281",
"282", "283", "284", "285", "286", "287", "288", "289", "290",
"291", "292", "293", "294", "295", "296", "297", "298", "299",
"300", "301", "302", "303", "304", "305", "306", "307", "308",
"309", "310", "311", "312", "313", "314", "315", "316", "317",
"318", "319", "320", "321", "322", "323", "324", "325", "326",
"327", "328", "329", "330", "331", "332", "333", "334", "335",
"336", "337", "338", "339", "340", "341", "342", "343", "344",
"345", "346", "347", "348", "349", "350", "351", "352", "353",
"354", "355", "356", "357", "358", "359", "360", "361", "362",
"363", "364", "365", "366", "367", "368", "369", "370", "371",
"372", "373", "374", "375", "376", "377", "378", "379", "380",
"381", "382", "383", "384", "385", "386", "387", "388", "389",
"390", "391", "392", "393", "394", "395", "396", "397", "398",
"399", "400", "401", "402", "403", "404", "405", "406", "407",
"408", "409", "410", "411", "412", "413", "414", "415", "416",
"417", "418", "419", "420", "421", "422", "423", "424", "425",
"426", "427", "428", "429", "430", "431", "432", "433", "434",
"435", "436", "437", "438", "439", "440", "441", "442", "443",
"444", "445", "446", "447", "448", "449", "450", "451", "452",
"453", "454", "455", "456", "457", "458", "459", "460", "461",
"462", "463", "464", "465", "466", "467", "468", "469", "470",
"471", "472", "473", "474", "475", "476", "477", "478", "479",
"480", "481", "482", "483", "484", "485", "486", "487", "488",
"489", "490", "491", "492", "493", "494", "495", "496", "497",
"498", "499", "500", "501", "502", "503", "504", "505", "506",
"507", "508", "509", "510", "511", "512", "513", "514", "515",
"516", "517", "518", "519", "520", "521", "522", "523", "524",
"525", "526", "527", "528", "529", "530", "531", "532", "533",
"534", "535", "536", "537", "538", "539", "540", "541", "542",
"543", "544", "545", "546", "547", "548", "549", "550", "551",
"552", "553", "554", "555", "556", "557", "558", "559", "560",
"561", "562", "563", "564", "565", "566", "567", "568", "569",
"570", "571", "572", "573", "574", "575", "576", "577", "578",
"579", "580", "581", "582", "583", "584", "585", "586", "587",
"588", "589", "590", "591", "592", "593", "594", "595", "596",
"597", "598", "599", "600", "601", "602", "603", "604", "605",
"606", "607", "608", "609", "610", "611", "612", "613", "614",
"615", "616", "617", "618", "619", "620", "621", "622", "623",
"624", "625", "626", "627", "628", "629", "630", "631", "632",
"633", "635", "636", "637", "638", "639", "640", "641", "642",
"643", "644", "645", "646", "647", "648", "649", "650", "651",
"652", "653", "654", "655", "656", "657", "658", "659", "660",
"661", "662", "663", "664", "665", "666", "667", "668", "669",
"670", "671", "672", "673", "674", "675", "676", "677", "678",
"679", "680", "681", "682", "683", "684", "685", "686", "687",
"688", "689", "690", "691", "692", "693", "694", "695", "696",
"697", "698", "699", "700", "701", "702", "703", "704", "705",
"706", "707", "708", "709", "710", "711", "712", "713", "714",
"715", "716", "717", "718", "719", "720", "721", "722", "723",
"724", "725", "726", "727", "728", "729", "730", "731", "732",
"733", "734", "745", "746", "747", "748", "749", "750", "751",
"752", "753", "754", "755", "756", "757", "758", "759", "760",
"761", "762", "763", "764", "765", "766", "767", "986", "987"
), class = "factor")), row.names = c(NA, -44L), class = "data.frame")
Solution 1:[1]
I work in a different field but encountered similar issues when making predictions from clogit
models. It appears there's different schools of thought on whether making predictions is even valid from these models (see comments in this thread How to get fitted values from clogit model). Nevertheless, we attempted Terry Therneu's workaround presented in the above link. Although I can't reproduce you're data, below is a hypothetical and generic example of what we did. Essentially, we averaged predictions across strata. Would be interested in commentary from experts of which I am not one.
library(tidyverse)
library(ggplot2)
library(survival)
dat <- data.frame(event = c(0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1),
strata = factor(c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4,5,5,5,5,5)),
cont_var1 = c(100, 90, 10, 20, 60, 95, 85, 15, 25, 65, 90, 80,
20, 30, 70, 100, 90, 10, 20, 60, 90, 80, 20, 30, 70),
cat_var = factor(c("a","a","a","a","a","a","a","a","a","a","b","b","b","b",
"b","b","b","b","b","b","a","a","a","a","a")))
m1 <- clogit(event ~ cont_var1 + cat_var + strata(strata), data = dat, iter.max = 100)
# new dataframe
newdat <- expand_grid(cont_var1 = seq(20, 100, 1),
cat_var = factor(c("a"),
levels = c("a", "b")),
strata = strata(unique(dat$strata)))
# predict
pred_dat <- newdat %>%
bind_cols(predict(m1,
newdata = newdat,
type = "risk",
se.fit = TRUE))
# plot (note first step calculating probability from predictions)
pred_dat %>%
mutate(prob = fit / (1 + fit),
low = (fit - 1.96 * se.fit) / (1 + (fit - 1.96 * se.fit)),
up = (fit + 1.96 * se.fit) / (1 + (fit + 1.96 * se.fit)),
low = ifelse(low < 0, 0, low),
up = ifelse(up > 1, 1, up)) %>%
group_by(cont_var1) %>%
summarise(prob = mean(prob), low = mean(low), up = mean(up)) %>%
ggplot()+
geom_line(aes(x = cont_var1, y = prob)) +
geom_ribbon(aes(x = cont_var1, y = prob, ymin = low, ymax = up), alpha = 0.1) +
theme_classic(base_size = 16) +
labs(y = "Probability (level 'a')", x = "Continuous Variable")
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | DharmanBot |