21 Logistic Regression
Up until now, we have dealt with linear regression which requires a continuous
dependent variable. However in research, especially medical research, lots of
outcome variables are binary such as disease present or absent, death or
survival and cured or not cured. Modelling binary outcome data usually requires
logistic regression and this is done in R using the glm()
function with the
family specified as binomial.
In this section, we go back to the ANCdata
used previously.
df_anc <-
readstata13::read.dta13(".\\Data\\ANCdata.dta")
And summarize as below
df_anc %>% summarytools::dfSummary(graph.col = F)
Data Frame Summary
df_anc
Dimensions: 755 x 3
Duplicates: 747
--------------------------------------------------------------------------
No Variable Stats / Values Freqs (% of Valid) Valid Missing
---- ---------- ---------------- -------------------- ---------- ---------
1 death 1. no 689 (91.3%) 755 0
[factor] 2. yes 66 ( 8.7%) (100.0%) (0.0%)
2 anc 1. old 419 (55.5%) 755 0
[factor] 2. new 336 (44.5%) (100.0%) (0.0%)
3 clinic 1. A 497 (65.8%) 755 0
[factor] 2. B 258 (34.2%) (100.0%) (0.0%)
--------------------------------------------------------------------------
21.1 Logistic regression with a single binary predictor
Our mission is to determine the relationship between the anc (anc) type used for managing pregnant women and the outcome of the pregnancy (death). To answer this question we run a logistic regression model in its simplest form as below
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | -2.09 | 0.156 | -13.4 | 6.63e-41 |
ancnew | -0.667 | 0.279 | -2.39 | 0.0166 |
The object that results from a glm() model is of class glm and lm. lm because it could also be used for linear modelling as we did using the lm() function.