11 Hypothesis-Testing

11.1 Population and sample

11.1.1 Population

A population in a statistical sense is different from the general sense. The population is a collection of items, people, places etc. that the investigator is generally interested in and wants to study. This population tends to be large necessitating the investigator to pick just a representative sample to study a particular property. For example:

  1. To determine the proportion of Ghanaians who are males we may pick a representative sample to do that but the study population is all “Ghanaians”.
  2. To determine the proportion of defective items produced by a factory we sample some of the items but the actual population involves all items produced and possibly those yet to be produced.

Therefore, the statistical definition of a population could include items, places or persons not even born or non-existent.

11.1.2 Sample

This is part of the population selected by whatever method, usually because the whole population is too large or unavailable to be studied. There are many ways to select the sample from the population. The sample is thus usually smaller than the population.

11.2 Descriptive versus Inferential Statistics

In almost every research the idea is to determine a specific parameter in the population. For instance, to determine the proportion of children under the age of 18 years on a specific flight (e.g. British Airways (BA) from London to Johannesburg) we start by randomly selecting a sample of BA flights between these two destinations. Next, we determine the ages of those on these selected flights and finally come up with the proportion of those with ages less than 18 years. Bear in mind the population includes future flights so the population parameter can rarely be obtained. However, the sample proportion (statistic) determined above could be a good estimate of the population parameter.

Descriptive statistics involve statistical manipulations done on a specific sample whereas inferential statistics is the manipulation used to estimate the population parameter from the sample statistic. Using the example above, determining the proportion of under-18-year-old persons on our chosen flights falls under descriptive statistics whereas estimating the population proportion of under-18-year-old persons who fly BA along that route from our sample the statistic is by the use of inferential statistics.

11.3 Sample variation

The Mid-upper arm circumference of children under five is a measure of how thin a child is and a quick way of determining his/her nutritional status. In a population of 2000 children, the mean and median mid-upper arm circumferences were to be determined independently by a group of 10 students. They decided to take a random sample of 10 children each to estimate these parameters. Each column of students shows the measurements made by each student in his/her sample chosen. We first read the data.

df_students <- read.delim("./Data/students.txt", sep = " ")

And visualise it

df_students
Table 7.2:
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
14 14 17 16 10 17 15 17 7 14
14 14 15 16 18 11 13 9 13 9
16 15 17 9 13 13 15 15 14 17
14 8 14 15 14 8 18 13 13 17
15 12 17 16 22 17 16 17 8 13
14 18 9 16 14 13 14 15 17 17
17 17 17 20 19 18 18 15 9 15
14 16 16 15 16 17 14 21 15 13
12 17 14 14 10 13 14 11 13 10
11 18 13 19 7 12 16 18 14 14

Next, we determine the mean and median values obtained by each student

df_students %>% 
    summarize(across(X1:X10, ~mean(.)))
Table 6.1:
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
14.1 14.9 14.9 15.6 14.3 13.9 15.3 15.1 12.3 13.9

It is obvious that despite all the data coming from the same population the means obtained were different each time. In effect despite using the same population, since choosing a sample is a random process the descriptive statistic(s) obtained each time a sample is chosen from a population will vary. The differences (variation) in the statistics obtained (mean and median MUAC) for instance, with every sample are described as sample variation. If these statistics vary then how can we determine the population parameter? This is what inferential statistics is all about, estimating the population parameter from the sample statistic.

11.4 Hypothesis testing

The collection of scientific data is usually preceded by an idea one wants to prove or disprove. As humans, we often have preconceived ideas or opinions of the expected results of a study. This subconscious thinking is brought to light by formally setting a hypothesis and testing it. This section deals with formalizing the steps involved in this process and how it affects our study design, data collection, analysis, and presentation of results.

11.4.1 Stating the hypothesis.

Standing in the window one morning Mr Osei wondered (again) if the number of women using the services of the bank next door was the same as that of men. There is a vibrant market nearby which has mainly women trading their wares. The proximity of the market to the bank may have given him this impression. Now he has decided to investigate this. Subconsciously he is wondering if:

The proportion of men using the services of the bank is the same as that of women.

Conversely, he could also be wondering if :

The proportion of men using banking services is different from that of women.

Bear in mind this second line of thinking includes men using the services more compared to women or vice versa. These competing ideas give rise to the following hypothesis:

Null hypothesis (H0): The proportion of men using the services of the bank next door is not different from that of women.

Or alternatively

Alternate Hypothesis (Ha): There is a difference in the proportion of men and women using banking services.

These two hypotheses describe Mr. Osei’s idea but in an opposing manner. The null and alternate hypotheses can then be tested through a well-designed study. For some statistical and technical reasons, the null hypothesis is often preferred in this regard.

11.4.2 Testing the Hypothesis

After the hypothesis has been stated the next objective is to collect evidence (data) to either prove or disprove it. It is obvious that the customers (study population) include future ones and so this assertion can never be completely determined if all customers are to be enumerated. Hence Mr Osei decides to pick a sample of the customers. Out of this sample, he needs to determine if he has enough evidence to disprove his null hypothesis. If he thinks he does he can conclude that

“There is enough evidence to say the proportion of men and women using the bank’s services are not the same”.

In other words, they are different in proportions. If on the other hand, he does not come up with significant evidence against his null hypothesis he can conclude that

“There is insufficient evidence to conclude the proportion of men using the bank’s services is the same as women”.

11.4.3 Type I error

We recollect from earlier on in this chapter that since there are many ways of choosing a sample from a population the results tend to differ from sample to sample. This we called sample variability. Due to sample variability, sample statistics usually differ from the population parameter.

In the example involving Mr. Osei and the bank customers, he proceeded to collect the sexes of 250 systematically selected samples of customers and came up with the following results. There were 112(44.8%) males and 138(55.2%) females in his sample. We note here that this is a sample and by inference, another sample could give an entirely different result (sample variability).

So, the question is do we think we have enough evidence to reject H0? Do we think this difference in proportion is significant evidence against the notion that the proportions are the same? We can never be entirely sure if our rejection of H0 is right or wrong. However, we can conclude that the smaller the proportion of males in our obtained sample the higher the chance that the null hypothesis (that the proportions are the same) is false. In hypothesis testing a type I error is said to have been made if the null hypothesis is rejected when in fact it is true. We can therefore say that a type I error is made if H0 is wrongly rejected.

In our example above Mr Osei would be making a type I error if he concludes based on the data obtained that the proportions of men and women are not the same when in fact they are in the population.

11.4.4 The Significance Level

The significance level, usually denoted by alpha (\(\alpha\)) is the probability of making a type I error. It is usually set by the investigator and has a direct bearing on the sample size of a study.

11.4.5 Type II error

Conversely, a type II error is said to have been made if the researcher fails to reject the null hypothesis when in fact it is false. Applied to our situation a type II error could be committed if Mr Osei based on his stated result above decides there is insufficient evidence to conclude the proportion of the two sexes differ when in fact they do differ in the population of bank users.

The two types of errors mentioned above always increase at the expense of the other. That is as type I error rises type II error falls and vice versa.

11.4.6 Power

The probability of committing a type II error is called beta (\(\beta\)). The probability of not committing a type II error is called the Power of the test. That is

p(Type II error) = \(\beta\) and \(Power = 1 - \beta\)

The power of a statistical test, therefore, measures its ability to reject the null hypothesis when it is false and hence make the right decision.

11.4.7 Critical value

Another way of stating the null hypothesis put forward by Mr Osei is

H0: The difference between the proportion of men and women using the services of the bank is zero.

When both proportions are 50% the absolute difference (difference disregarding whether it is negative or positive) in proportions will be zero. It can thus be seen that the possible values the absolute difference in proportion can have is from 0% (when there are equal numbers of males and females in the sample) to 100% (when there are either no females or males). This is diagrammatically shown in Figure 21. In this diagram, we have the percentage of men and women on the x-axis with the absolute difference on the y-axis. The V-shaped graph is at its lowest (0%) when the percentages are equal for men and women. On the other hand when either the males are 100% (far right) or females are 100% (far left) the the absolute value rises to 100%. For the results obtained by Mr Osei, the absolute difference between the percentage of men and women is

\[55.2\% - 44.8\% = 10.4\%\] Also, intuitively if there is no difference between the proportion of sexes (i.e. 50% each) then the sample (if randomly selected) difference in proportion is more likely to be near 0 than 100. The nearer it is to 0 the less likely we are to reject the null hypothesis. The nearer it is to 100 the more likely we are to reject the null hypothesis.

The next issue arises! At what cut-off value would one decide there is enough evidence to reject the H0? This hypothetical value, often set by the investigator is called the critical value. If Mr. Osei had set a critical value of the difference in proportion of 10% (green horizontal line in the graph) he would have rejected the null hypothesis since 10.4% falls above this line. If on the other hand, he had set a critical value of 12.0% he would have failed to reject H0 as the value would be greater than his observed statistic of 10.4% hence falling below the green line.

11.4.8 Critical region

What the critical value does is to divide the range of the test statistic into two possible regions. The region where values obtained will not lead to rejection of H0 (below the green line) and the other where any value obtained would lead to rejection of H0 (above the green line). The critical region is the latter. The former is often called the acceptance region.

For Mr. Osei’s example, using a critical value of 10% we can divide the possible values of the test statistic into two regions. 0% to less than 10% and 10% to 100%. The region 10% to 100% is the critical region for his study. If he comes up with any value in this region he would automatically reject the null hypothesis.

11.4.9 P-value

P-values are very well-known in research. It is often misinterpreted and overemphasized. It however has a pivotal place in research and statistical inference. The probability value (p-value) is the probability of having a statistic as extreme as the one observed from the sample if the null hypothesis is true. It determines the strength of support for the null hypothesis. Thus the nearer the p-value is to 1 the better the data at hand or test statistics support the null value. On the other hand the nearer the p-value is to 0 the less the statistic or data supports the null hypothesis. In data analysis, the p-value is often compared to the significance level (\(\alpha\)). If it is less than the significance level the result is said to be statistically significant. In the case of Mr Osei, he chose a significance level of 0.05 however the p-value (after analysis using software) was determined to be 0.1137. That is the probability of obtaining the proportions (112(44.8%) males and 138(55.2%) females) as observed if the null hypothesis (50% vs 50%) is true is 0.1137.

We therefore conclude based on a significance level of 0.05 that the results are not statistically significant.

11.4.10 Steps in Hypothesis testing

After going through the terms above we are now ready to outline how hypothesis testing is done in statistics.

There are 4 basic steps. These are:

  1. The first step in all hypothesis testing is to state the null hypothesis (H0).
  2. Next we decide on the significance level (\(\alpha\)). Typically we would use an \(\alpha\) of 0.05 but 0.1 or 0.01 are also sometimes used.
  3. Next we compute the probability value (p-value). This has been explained above
  4. Finally, we compare the p-value and the significance level. If the p-value is lower than \(\alpha\) we reject the null hypothesis if not we refuse to reject the null hypothesis.

Generally, the lower the p-value the more one is confident in rejecting H0. Note that failure to reject H0 does not mean H0 is true. It simply means we don’t have enough evidence to reject it.

11.5 Estimation

11.5.1 Point and interval estimates

Previously we came across statistics such as mean and proportion used as estimates of the population parameter. These are called point estimates as they usually give a single value as the population estimate. However, another way of determining the population parameter is to provide an interval rather than just a statistic. This is referred to as an interval estimate.

Another way Mr. Osei could express his result of the estimate for the proportion of men could be between 38.5% and 51.2%. Thus he is indicating that based on the data available to him he thinks the population estimate of the proportion of men is between 38.5% and 51.2%.

11.5.2 Confidence Interval

If one has ever read a research journal article he may have come across the phrase confidence interval. This is used to express the uncertainty or precision of the results obtained from a sampling method.

In a study to determine the average age of approximately 200,000 factory workers in Ghana, an investigator decided to choose a sample of 200 and determine the mean age. Of course, the sample mean is likely to differ from the population mean because of sample variation. Based on the data the investigator can estimate the interval where the population mean is likely to fall. This estimate is the confidence interval. We can imagine that instead of just one study the same study is done by a hundred persons each choosing 200 workers randomly and then coming up with similar confidence intervals based on their samples. Because of sampling variation, each study is likely to come up with different values for the confidence intervals. The intervals generated by the hundred persons in all 100 studies is as plotted in Figure 22. The green horizontal line is the population mean age.

The 95% confidence interval is the confidence interval that is likely to contain the population mean 95% of the time. The estimate provided in Figure 22 is the 95% confidence interval of the individual studies. It can be seen that 5 out of the 100 randomly sampled workers yielded a confidence interval not including the population mean. Similarly, the 99% confidence interval for repeated studies is going to contain the population mean in 99 % of the cases. It is obvious from the above that interpreting the confidence interval in terms of one study can be problematic. However, an intuitive way of understanding the confidence interval is: There is a 95% chance that the 95% confidence interval calculated contains the true population mean. In other words, there is always a 5% chance that the 95% confidence interval generated does not contain the population mean.

What determines the width of the confidence interval?

  1. It is first determined by the size of the confidence interval. A 99% confidence interval is wider than a 95% confidence interval which in turn is wider than the 90% confidence interval.
  2. Secondly the smaller the sample size the wider the confidence interval. This is just are ection of the uncertainty in the estimate.
  3. Finally the variation in the data obtained reacts in the confidence intervals. Populations with elements that are similar will yield a smaller confidence interval compared to one in which the observations are so variable. For instance, the confidence interval of the mean weight of only babies will be much narrower than the confidence interval of the whole population. This is because the weights of the babies are likely to be near each other (less variability) than the weight of the whole population in which there are babies, children, adolescents and adults.

A common mistake is when the population parameter is said to have for instance a 95% probability of lying within the 95% confidence interval. For instance interpreting the 95% confidence interval of 38.5% to 51.2% as the probability that the proportion of men using the bank lies within that range is wrong. This is because the population parameter is a fixed number and has no probability.