3 Statistical data types
We begin proper data analysis by revising what statistical data types are. There are two broad types of data; Quantitative and Qualitative
3.1 Qualitative data
In this data type, the observations fall into distinctive categories. There is usually no scale applicable to qualitative data type. These are further divided into:
3.1.1 Nominal
These are qualitative data types that have no order. The colors of a flag for instance can be ”red” ”yellow” and ”green”. None of these can be said to be coming after the other. Contrast this to the one immediately below.
3.1.2 Binary
A special type of nominal data type is binary data. It is very common in statistical analysis. These are observations that can take only two values. A question that for instance records the presence of a disease will only have a ”Yes” or ”No” answer. Sex is usually recorded as ”Male” or ”Female”.
3.1.3 Ordinal
An ordinal qualitative data type has an order to it. A commonly used one is the socioeconomic status, often categorised as ”Low”, ”Middle” and ”High”. Although we cannot say that the interval between ”Low” and ”Middle” is the same as ”Middle” and ”High”, we know ”Low” is lower than ”Middle” which in turn is also lower than ”High”. The Likert scale, a well-known scale in many social science research is also an example of an ordinal scale. Ordinal variables are often created from quantitative (see below)variables. E.g. the ages of a group of men can be converted into age groups of any desired number of categories.
3.2 Quantitative data
Quantitative or numerical data are observations such as numbers that can be measured. There are two types:
3.2.1 Discrete
Discrete quantitative data is one that only specific values can be obtained. The number of persons attending a theatre can only be a whole number. So is the number of votes obtained in an election. Although discrete quantitative variables are often analysed as continuous ones they can occasionally pose problems when analysed as such. We will be dealing with some of these in the subsequent chapters.
3.2.2 Continuous
Continuous quantitative variables on the other hand can be measured to any precision, thereby making the figures they present to be as precise as the experimenter desires. For instance, the distance between two towns can be measured in kilometres to as much precision as possible. Theoretically, this can be as 12.0kms to as much as 12.0234278kms.
3.3 Other specific data types
There are other specific subtypes of data encountered in statistical analysis. Some of these are:
3.3.1 Ratios
Ratios are special continuous variables that are generated from two other variables. For example, the ratio of boys to girls in a sample can be determined by dividing the number of boys by that of girls. The figures obtained are similar to continuous variables but will require special techniques in analysis.
3.3.2 Rates
Rates are population parameters often used in medicine and epidemiology. Examples include the population growth rate and mortality rate. It is also a statistic or parameter generated from two others. The mortality rate is generated for instance from the number of deaths and time interval. In the case of neonatal mortality rate, this is generated from the number of deaths and the number of live births in the same period. In analysis, this is often treated as indicated for the ratios above.
3.3.3 Percentage
Percentages are peculiar as they often have a definite maximum and a minimum of 0% and 100% respectively. However, percentage changes can take up any value. For instance, a change from 4 to 3 will yield a negative (-25%) change. To avoid the tedious nature of these percentages it is advisable to often retain and work with the specific values involved in determining the percentage rather than the percentages derived.
3.3.4 Ranks
Ranks are often treated as continuous variables though they often are not. A well-known example is the position of a student in a class test. The position is just relative to others and differs from the actual mark scored. The ranks may give the impression of an equal space between consecutive ranks but the actual difference may be much smaller or bigger than the rank difference.