4 R data types
All data objects in R are made up of smaller units referred to as ”Atomic” data. The various atomic data types are Integer, Double, Complex, Logical, Character, Factor and Date and Time. These would be further explained subsequently. To determine the data types in R the most commonly used function is class(). The example below uses this function to determine the data type for 4.2.
As we expected it is a numeric variable. Next, we determine the class of data ”A”
R classifies this as a character data type. Finally, what about FALSE?
This is a logical data type.
4.1 Integer
The integer data types are made up of numeric variables that can be counted, similar to the discrete quantitative variable described above. By default, R does not store numbers as integers but there may arise situations where numbers will have to be converted to integers to facilitate manipulations. To determine if data is of class integer we use the function is.integer(). To convert data of another type to integer we use the function as.integer(). As an illustration, we determine if the number 9 in R is an integer. We first, assign the number 9 to X and then we ask if X is stored as an integer.
It is not! Next, we convert it to an integer, this time calling it Y. Then find out if it is.
It is an integer now.
4.2 Double
This is a number that can take any value including decimals, similar to the continuous quantitative variable discussed above. Double is the default type of numeric variable used by R. To illustrate this point let’s look at the properties of the number 7.1 in R. First we assign the name “A” to 7.1.
A <- 7.1
And then find out if A is of class “Double”.
Numeric data types stored as double are never stored as exact but rather as approximations of real numbers. To illustrate this, we add three As and test if the answer is 21.3
It may appear strange that adding A or 7.1 three times does not equal 21.3. This is because A is stored as an approximate and not exact. This could be very important when manipulating data in R and many other statistical software.
4.3 Logical
A Logical is an object stored as TRUE or FALSE. The example below shows the creation of “Z” from a statement asking if 5 is less than 8. Z, therefore, is a logical (TRUE) data type. First, we assign the values 5 and 8 to X and Y respectively and then create the logical data type Z by asking if X is less than Y in R.
Next, we determine the class of Z
Logical objects have innate values in R such that FALSE is considered to have a value of 0 while TRUE has a value of 1. The output below demonstrates this.
Apart from the “<” operator used in the example above there are other logical operators in R. The following examples illustrate the use of some other logical operators. We begin by assigning the ages of a man and his wife as 45 and 23 years, respectively.
wife <- 23
husband <- 45
Next, we use this to answer some basic questions about the couple. First, we ask if the wife’s age is less than or equal to 35
Next, we ask if the husband is more than or equal to 45 years
The next example combines three logical operators. Here we ask if the wife is less than 25 years old and at the same time, the husband is greater than 35 years
Finally, we ask whether the wife is less than 25 years or the husband is greater than 50 years
4.4 Character
A Character is an object enclosed in double quotes. These are called strings or names and cannot be used in mathematical calculations. Examples include “red”, “Male”, and “1”. As seen, “1” (different from the number 1) is a character and cannot be used for calculations unless converted to another object form as an integer or double. We illustrate some of these by creating two characters below
A <- "x"
B <- "2"
And then determine their class.
To illustrate that “2” is a character and so cannot be added, we do
However, we can convert B into a numeric variable C by using the function as.numeric().
We can now use the numeric variable C in calculations as below
4.5 Factor
A Factor in R is a categorical variable such as sex (male & female). Factor variables have levels representing the different categories. Sex naturally will have two levels, Male, Female. Factors can be derived from numeric and character objects using as.factor(). Below we form a character variable of length four called blood.grp using one of the most used functions in R c().
And then convert it to a factor variable
Next, we determine the categories (levels) of the factor variable using the function levels()
Often, it becomes necessary that factor variables are converted to an integer while retaining the order in which the categories appear. For instance, one may have a factor variable with levels “A”, “B”, “C”, and “D” but wants to convert this to an integer variable with A, B, C and D being represented by 1, 2, 3 and 4. This is achieved in R by unclassing. Unclassing a factor variable assigns numbers, starting from 1 to the levels of the factors in the order of the levels. The output below shows the unclassed numbers for the factor blood.grp2 and the levels they refer to. We do this by the use of the unclass() function.
A very useful function in R to generate a factor variable is gl(). It generates a factor variable with a specified number of levels (n) and replications (k). A practical example is shown below. Factor fac.1 is created by forming a vector of three levels and two replications.
The factor can also be created with labels as shown
4.6 Ordered Factor
If the order of a factor is important it must be declared as an ordered factor, also known as an ordinal categorical variable explained earlier in this chapter. Below we create a character variable.
Next, we convert it to an ordered factor using the function ordered().
ord.size <- ordered(size, levels=c("Small", "Medium", "Large"))
ord.size
[1] Medium Large Small Medium
Levels: Small < Medium < Large
class(ord.size)
[1] "ordered" "factor"
An ordered factor variable can also be derived using the gl() function introduced above.
4.7 Date and time
Date and Time objects are the last to be discussed in this section. They can be created with the functions
as.Date(), as.POSIXct(), as.POSIXlt(), strptime(), ISOdatetime() and ISOdate().
Most date and time data creation require the use of character data and a “format”. The format dictates to the function the format in which the character data is recorded i.e. dd/mm/yy, mm/dd/yyyy, yyyy/mm/dd hh:mm:ss etc. These formats are declared with the % symbol.
The first object we create below is of the class “Date” and is created using a character object “01/01/1970”. To do this we first create this character object
Next, we convert the character object into a Date object
Note how the format was specified. Each day, month or year is preceded by the % symbol. The symbols that separate the values in the dates are also specified accordingly. Next, we derive an object of the class Date and time but referred to in R as POSIXct or POSIXlt. Though this is referred to as “Date and Time” it can be only a Date or Date with Time in specified formats. Below we create a POSIXct which is in a Date only format.
Next, we create a Date and Time variable that shows both the date and time
dt3 <- as.POSIXct("2003-04-23 15:34")
dt3
[1] "2003-04-23 15:34:00 GMT"
class(dt3)
[1] "POSIXct" "POSIXt"
The next example uses the function strptime()
to create a “POSIXlt” DateTime
format data.
dt4<-strptime("02/27/92 11:30:10", format="%m/%d/%y %H:%M:%S")
dt4
[1] "1992-02-27 11:30:10 GMT"
class(dt4)
[1] "POSIXlt" "POSIXt"
Next, we use ISOdatetime()
to create a “POSIXct” DateTime format data type.
Note that the two functions ISOdatetime()
and ISOdate()
take individual
numeric values and combine them rather than convert character variables.
dt5<-ISOdatetime(
year=2013, month=4, day=7, hour = 12, min = 33, sec = 10, tz = "GMT"
)
dt5
[1] "2013-04-07 12:33:10 GMT"
class(dt5)
[1] "POSIXct" "POSIXt"
And finally the function ISOdate()
dt6 <- ISOdate(year = 2013, month = 4, day = 7, tz = "GMT")
dt6
[1] "2013-04-07 12:00:00 GMT"
class(dt6)
[1] "POSIXct" "POSIXt"
Mathematical manipulations can be done on date objects. Subtracting one date from the other yields an object which is the period between the two. This object is of class difftime. As an illustration, we determine the time difference between dt5 and dt6
It is worth noting here that there is a function in R, difftime()
specifically
designed for finding time differences.
The functions weekdays()
, months()
and quarters()
extract as “character”
datatype the days, months and quarters from date objects respectively.
4.8 Missing values in R
In R missing values are denoted by NA. NaN is also encountered and stands for “Not a number”. This is often generated when one divides for instance 0 by 0. Any operation involving NA results in an NA. Examples are shown below
The function is.na()
produces a logical that indicates if a value is missing.