24 Ordinal logistic regression

24.1 Importing data

In this presentation, we analyse a dataset using ordinal logistic regression. We begin by reading the data and selecting our desired subset.

dataF <- 
    dget("./Data/anemia_data") %>% 
    select(sid, anemia_cat, community, fever, sex,
           famsize, moccup2, foccup2, hosp_visit)

We then view a summary of the data

dataF %>% glimpse()
Rows: 476
Columns: 9
$ sid        <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, …
$ anemia_cat <fct> Mild, Moderate, Normal, Severe, Mild, M…
$ community  <fct> Kasei, Kasei, Kasei, Kasei, Kasei, Kase…
$ fever      <fct> Yes, Yes, Yes, Yes, Yes, No, No, Yes, Y…
$ sex        <fct> Female, Female, Female, Male, Male, Fem…
$ famsize    <dbl> 4, 4, 2, 3, 5, 4, 9, 4, 4, 10, 3, 4, 3,…
$ moccup2    <fct> Farmer, Farmer, Other, Other, Farmer, F…
$ foccup2    <fct> Farmer, Farmer, Other, Farmer, Farmer, …
$ hosp_visit <fct> No, No, No, Yes, Yes, No, No, No, No, Y…

Note that the anemia_cat variable is an ordered factor variable. For completeness the single missing observation for the variable hosp_adm will be recoded to No.

dataF <- dataF %>% 
    mutate(hosp_visit = forcats::fct_explicit_na(hosp_visit, na_level = "No"))
Warning: There was 1 warning in `mutate()`.
ℹ In argument: `hosp_visit =
  forcats::fct_explicit_na(hosp_visit, na_level = "No")`.
Caused by warning:
! `fct_explicit_na()` was deprecated in forcats 1.0.0.
ℹ Please use `fct_na_value_to_level()` instead.
summary(dataF)
      sid           anemia_cat       community   fever    
 Min.   :  1.0   Normal  : 92   Asuogya   : 61   No :160  
 1st Qu.:119.8   Mild    :143   Sunkwae   : 62   Yes:316  
 Median :240.5   Moderate:226   Dromankoma:229            
 Mean   :240.2   Severe  : 15   Kasei     :124            
 3rd Qu.:360.2                                            
 Max.   :501.0                                            
     sex         famsize         moccup2      foccup2   
 Male  :252   Min.   : 0.000   Farmer:314   Farmer:355  
 Female:224   1st Qu.: 4.000   Other :162   Other :121  
              Median : 5.000                            
              Mean   : 5.151                            
              3rd Qu.: 6.000                            
              Max.   :11.000                            
 hosp_visit
 No :320   
 Yes:156

24.2 Model specification

Now we begin the ordinal regression by fixing the first model, the Null model.

Model_0 <- ordinal::clm(anemia_cat ~ 1, data = dataF, link = "logit")
summary(Model_0)
formula: anemia_cat ~ 1
data:    dataF

 link  threshold nobs logLik  AIC     niter max.grad
 logit flexible  476  -543.39 1092.77 7(0)  2.15e-13
 cond.H 
 1.4e+01

Threshold coefficients:
                Estimate Std. Error z value
Normal|Mild     -1.42885    0.11608 -12.310
Mild|Moderate   -0.02521    0.09168  -0.275
Moderate|Severe  3.42535    0.26237  13.056

Subsequently, we introduce the fever variable as independent and express the results as OR with 95%CI

Model_1 <- ordinal::clm(anemia_cat ~ fever, data = dataF, link = "logit")

broom::tidy(Model_1, conf.int = TRUE, exponentiate = TRUE)%>% 
    flextable::as_flextable() %>% 
    flextable::colformat_double(
        j = c("estimate", "std.error", "statistic", "p.value", 
              "conf.low", "conf.high"), 
        digits = 3)

term	estimate	std.error	statistic	p.value	conf.low	conf.high	coef.type
character	numeric	numeric	numeric	numeric	numeric	numeric	character
Normal\|Mild	0.306	0.163	-7.268	0.000			intercept
Mild\|Moderate	1.257	0.152	1.505	0.132			intercept
Moderate\|Severe	40.111	0.293	12.601	0.000			intercept
feverYes	1.461	0.181	2.090	0.037	1.024	2.086	location
n: 4

Results indicate a significant association between fever and the degree of anaemia (OR=1.46, 95%CI: 1.02 to 2.09). Performing an ANOVA test to see if there exists a difference between the 2 models.

anova(Model_0, Model_1)

Table 6.3:
no.par	AIC	logLik	LR.stat	df	Pr(>Chisq)
3	1.09e+03	-543
4	1.09e+03	-541	4.37	1	0.0367

The results indicate adding fever to the Null model significantly improves the null model.

Next, we add the community variable

Model_2 <- 
     ordinal::clm(anemia_cat ~ fever + community, data = dataF, link = "logit")

broom::tidy(Model_2, conf.int = TRUE, exponentiate = TRUE)%>% 
    flextable::as_flextable() %>% 
    flextable::colformat_double(
        j = c("estimate", "std.error", "statistic", "p.value", 
              "conf.low", "conf.high"), 
        digits = 3)

term	estimate	std.error	statistic	p.value	conf.low	conf.high	coef.type
character	numeric	numeric	numeric	numeric	numeric	numeric	character
Normal\|Mild	0.191	0.286	-5.789	0.000			intercept
Mild\|Moderate	0.808	0.274	-0.775	0.438			intercept
Moderate\|Severe	27.058	0.367	8.985	0.000			intercept
feverYes	1.373	0.183	1.728	0.084	0.958	1.966	location
communitySunkwae	1.179	0.343	0.479	0.632	0.602	2.314	location
communityDromankoma	0.626	0.268	-1.747	0.081	0.368	1.054	location
communityKasei	0.463	0.289	-2.659	0.008	0.261	0.813	location
n: 7

24.3 Checking proportional odds assumption for the model

Here we check the proportional odd assumption for our second model

ordinal::nominal_test(Model_2)

Table 6.4:
Df	logLik	AIC	LRT	Pr(>Chi)
	-535	1.08e+03
2	-534	1.09e+03	0.379	0.827
6	-525	1.08e+03	18.7	0.00468

The significant p-value for the community variable indicates a breach of the proportional odd assumption

24.4 Prediction

In this section, we will use the model created above to predict an observation in a specific anaemia severity group. First, we begin by forming the prediction data we call newData.

NewData <- expand.grid(community = levels(dataF$community),
                       fever = levels(dataF$fever))
NewData

Table 6.5:
community	fever
Asuogya	No
Sunkwae	No
Dromankoma	No
Kasei	No
Asuogya	Yes
Sunkwae	Yes
Dromankoma	Yes
Kasei	Yes

We now predict the probability that the specific predictor combination falls within the specific outcome category (anaemia category)

(preds <- predict(Model_2, newdata = NewData, type = "prob"))
$fit
     Normal      Mild  Moderate     Severe
1 0.1601782 0.2868320 0.5173493 0.03564053
2 0.1392894 0.2675468 0.5514246 0.04173921
3 0.2335081 0.3300308 0.4138463 0.02261481
4 0.2915487 0.3440404 0.3475709 0.01684010
5 0.1220012 0.2486393 0.5810802 0.04827929
6 0.1054658 0.2277287 0.6103913 0.05641416
7 0.1816335 0.3030775 0.4845071 0.03078186
8 0.2306604 0.3289444 0.4174245 0.02297070

For better visualisation, we bind the original data with the predictions

bind_cols(NewData, preds$fit) %>% 
    kableExtra::kbl(caption = "Probabilities", booktabs = T, digits = 3) %>%
    kableExtra::kable_classic(full_width = F, html_font = "Cambria") %>% 
    kableExtra::kable_styling(bootstrap_options = c("striped", "hover"))

Table 6.7: Probabilities
community	fever	Normal	Mild	Moderate	Severe
Asuogya	No	0.160	0.287	0.517	0.036
Sunkwae	No	0.139	0.268	0.551	0.042
Dromankoma	No	0.234	0.330	0.414	0.023
Kasei	No	0.292	0.344	0.348	0.017
Asuogya	Yes	0.122	0.249	0.581	0.048
Sunkwae	Yes	0.105	0.228	0.610	0.056
Dromankoma	Yes	0.182	0.303	0.485	0.031
Kasei	Yes	0.231	0.329	0.417	0.023

24.5 Visualising the model

Below we visualize the model by using the MASS and effects packages. We begin by fitting the model again with polr function.

pol_model.1 <- MASS::polr(anemia_cat ~ community, data = dataF)
pol_model.2 <- MASS::polr(anemia_cat ~ fever*community, data = dataF)

And then we visualise the probability of having various forms of anaemia giving one belonging to the various groups.

M1 <- effects::Effect(focal.predictors = "community", mod=pol_model.1)

Re-fitting to get Hessian
M2 <- effects::Effect(focal.predictors = c("community", "fever"), mod=pol_model.2)

Re-fitting to get Hessian
plot(M1)

plot(M2)

23 ANCOVA & MANOVA

25 Survival analysis