21 Logistic Regression

Up until now, we have dealt with linear regression which requires a continuous dependent variable. However in research, especially medical research, lots of outcome variables are binary such as disease present or absent, death or survival and cured or not cured. Modelling binary outcome data usually requires logistic regression and this is done in R using the glm() function with the family specified as binomial.

In this section, we go back to the ANCdata used previously.

df_anc <- 
    readstata13::read.dta13(".\\Data\\ANCdata.dta")

And summarize as below

df_anc %>% summarytools::dfSummary(graph.col = F)
Data Frame Summary  
df_anc  
Dimensions: 755 x 3  
Duplicates: 747  

--------------------------------------------------------------------------
No   Variable   Stats / Values   Freqs (% of Valid)   Valid      Missing  
---- ---------- ---------------- -------------------- ---------- ---------
1    death      1. no            689 (91.3%)          755        0        
     [factor]   2. yes            66 ( 8.7%)          (100.0%)   (0.0%)   

2    anc        1. old           419 (55.5%)          755        0        
     [factor]   2. new           336 (44.5%)          (100.0%)   (0.0%)   

3    clinic     1. A             497 (65.8%)          755        0        
     [factor]   2. B             258 (34.2%)          (100.0%)   (0.0%)   
--------------------------------------------------------------------------

21.1 Logistic regression with a single binary predictor

Our mission is to determine the relationship between the anc (anc) type used for managing pregnant women and the outcome of the pregnancy (death). To answer this question we run a logistic regression model in its simplest form as below

df_anc %>% 
    glm(death ~ anc, family=binomial, data=.) %>% 
    broom::tidy()
Table 6.2:
term estimate std.error statistic p.value
(Intercept) -2.09  0.156 -13.4  6.63e-41
ancnew -0.667 0.279 -2.39 0.0166  

The object that results from a glm() model is of class glm and lm. lm because it could also be used for linear modelling as we did using the lm() function.