15 Risk and Odds

The analysis of the effects above depends mainly on the p-values and confidence intervals of difference in proportions. There is a more common and often better way of expressing these using Risk and Odds. In this chapter, we will use the ANCData.txt data to illustrate these. First, we read the data

df_anc <- 
    read.delim("./Data/ANCData.txt") %>% 
    mutate(
        death = factor(death, levels = c("no","yes"), labels = c("No", "Yes")),
        clinic = factor(clinic),
        anc = factor(anc, levels = c("old", "new"), labels = c("Old", "New")))

And summarize the data

df_anc %>% 
    summarytools::dfSummary(graph.col = FALSE)
Data Frame Summary  
df_anc  
Dimensions: 755 x 3  
Duplicates: 747  

--------------------------------------------------------------------------
No   Variable   Stats / Values   Freqs (% of Valid)   Valid      Missing  
---- ---------- ---------------- -------------------- ---------- ---------
1    death      1. No            689 (91.3%)          755        0        
     [factor]   2. Yes            66 ( 8.7%)          (100.0%)   (0.0%)   

2    anc        1. Old           419 (55.5%)          755        0        
     [factor]   2. New           336 (44.5%)          (100.0%)   (0.0%)   

3    clinic     1. A             497 (65.8%)          755        0        
     [factor]   2. B             258 (34.2%)          (100.0%)   (0.0%)   
--------------------------------------------------------------------------

15.1 Risk

Risk is defined as the probability of having an outcome. Therefore, if in a the population of 100, 35 develop diabetes mellitus after a specified period of follow-up, the risk of developing diabetes in the population is

\[\frac{35}{100} = 0.35\] Tabulation of the ANC method and the occurrence of death below, we can conclude that the risk of perinatal mortality when one uses the old method is 0.11 (11.0%) and that for the new method is 0.06 (5.9%).

df_anc %>% 
    tbl_cross(percent = "col",
        row = death, 
        col = anc, 
        label = list(death ~ "Death", anc = "ANC"),
        digits = c(0,2)) %>% 
    bold_labels()

	ANC		Total
	Old	New	Total
Death
No	373 (89.02%)	316 (94.05%)	689 (91.26%)
Yes	46 (10.98%)	20 (5.95%)	66 (8.74%)
Total	419 (100.00%)	336 (100.00%)	755 (100.00%)

This can be written as \[Re = 0:06 \text{ and } Rne = 0:11\]

Where \(Re\) is the risk in the exposed group (new anc method) and \(Rne\) is the risk in the non-exposed (old anc method).

15.2 Risk Ratio

A comparative way of expressing the risks in the two groups is by the use of the Risk Ratio or Relative Risk (RR). Where \[RR = \frac{Re}{Rne}\]

Note that by inference if the \(Re\) is the same as \(Rne\) then \(RR = 1\). The \(RR\) of perinatal mortality of the new compared to the old method is

\[RR = \frac{5.952381}{10.978520} = 0.5421843 \approx 0.54\]

The epiDisplay package has a function cs() which automatically calculates the RR and other relevant stats with their confidence intervals. This is applied to the ANCdata as below.

df_anc %$% epiDisplay::cs(outcome = death, exposure = anc, )

          Exposure
Outcome    Non-exposed Exposed Total
  Negative 373         316     689  
  Positive 46          20      66   
  Total    419         336     755  
                                    
           Rne         Re      Rt   
  Risk     0.11        0.06    0.09 
                                         Estimate Lower95ci
 Risk difference (Re - Rne)              -0.05    -0.09    
 Risk ratio                              0.54     0.32     
 Protective efficacy =(Rne-Re)/Rne*100   45.8     8.71     
   or percent of risk reduced                              
 Number needed to treat (NNT)            19.9     10.79    
   or -1/(risk difference)                                 
 Upper95ci
 -0.01    
 0.91     
 68.06    
          
 94.2

The output above first tabulates the two variables producing a contingency table with the marginal totals. It then shows our previously calculated parameters, Re and Rne. Rt (Risk total) is the risk if both the exposed and unexposed are put together, here.

\[Rt =\frac{20 + 46}{419 + 336} = \frac{66}{755} \approx 0.09\]

The next section of the output shows the risk difference (difference between the risks of the two groups), the risk ratio, the protective efficacy and the number needed to treat (NNT) together with their confidence intervals.

Interpreting the analysis so far, we conclude that the risk of perinatal death when using the new anc method is significantly less than using the old method. It significantly reduces the risk of death (Risk difference) by 0.05 (5%) and halves the chances of death (RR = 0.54, 95%CI: 0.32 to 0.91). About 20 (95%CI: 11 to 95) pregnant women need to be treated with the new anc method to prevent one perinatal death (NNT).

15.3 Odds

Another way of expressing the risk of an outcome is using the Odds. Statistically the odds is defined as

\[Odds = \frac{p}{1-p}\]

Where p is the probability of the outcome occurring. Using the ANCdata the probability of death in the exposed is.

\[pe = \frac{20}{336} = 0.05952381\]

The odds of death in the exposed can then be determined as

\[Oddse = \frac{0.05952381}{1-0.05952381} = 0.06329114\]

Similarly, the probability of death in the non-exposed (old anc type) is

\[pne = \frac{46}{419} = 0.1097852\]

And the odds of death in the non-exposed is

\[Oddsne = \frac{0.1097852}{1-0.1097852} = 0.1233244\]

15.4 Odds ratio

The comparative way of comparing the two odds is the Odds Ratio (OR). This is determined as

\[OR = \frac{Oddse}{Oddsne} = 0.5132086 \approx 0.51\]

Once again fortunately we do not have to go through this tedious procedure each time we need to calculate the OR. The cc() function in the epiDisplay package does this very well. Below we apply it to the analysis just done.

df_anc %$% epiDisplay::cc(outcome=death, exposure=anc, graph = FALSE)

       anc
death   Old New Total
  No    373 316   689
  Yes    46  20    66
  Total 419 336   755

OR =  0.51 
95% CI =  0.3, 0.89  
Chi-squared = 5.9, 1 d.f., P value = 0.015
Fisher's exact test (2-sided) P value = 0.019

The output shows a table of the variables in question, the OR with its 95% confidence interval and both p-values determine by the chi-squared test and the Fisher’s test. With a confidence interval of the odds ratio not containing the null value 1, and small p-values from both methods it can be concluded that the odds of death in mothers who used the new ANC method is about half (0.5) of those who used the old method and the probability of obtaining an OR this values if the null was true, is low (p-value = 0.019). Therefore, the use of the new anc method is associated with significantly better perinatal outcomes compared to the old.

Odds ratios are very important in regression analysis and will be dealt with in more detail in subsequent chapters.

14 Analysis of categorical data

16 Confounding and Interaction