6 Importing data into R

In this chapter, we discuss ways of getting data into R, either by directly entering it into R or importing it from another software. In R, a data frame is the data structure desirable for data analysis. With the advent of the tidy data, a tibble is now the predominant data structure being used. For this section, we will be reading in various file formats and presenting them as a tibble.

6.1 Using data in R packages

Many packages in R come with data that can be used for practice. To be able to use a dataset in a specific package, that package first has to be installed. For instance, to be able to use the data Oswego native to the package epiDisplay, we first ensure the package is installed by running the command line below:

install.packages("epiDisplay")

Note that this will install the package and any other packages that the epiDisplay package depends on.

The next step will be to make the data available in the R session as below

data("Oswego", package = "epiDisplay")

Now that the data is available in the working environment, we can visualise the first 6 rows below

Oswego %>% head()
Table 6.1:
age sex timesupper ill onsetdate onsettime bakedham spinach mashedpota cabbagesal jello rolls brownbread milk coffee water cakes vanilla chocolate fruitsalad
11 M         FALSE         FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
52 F 2e+03        TRUE 04/19 30        TRUE TRUE TRUE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE
65 M 1.83e+03 TRUE 04/19 30        TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE TRUE FALSE
59 F 1.83e+03 TRUE 04/19 30        TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE TRUE TRUE TRUE FALSE
13 F         FALSE         FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
63 F 1.93e+03 TRUE 04/18 2.23e+03 TRUE TRUE FALSE TRUE TRUE FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE FALSE

And finally visualise the details of the data by using the help() function as below

help(Oswego)

6.2 Direct entry into R

We first use the data.frame() function from the base package to create the data frame.

data.frame(
    name = c("Ama", "Yakubu", "John"), 
    sex = c("Female", "Male", "Male"),
    age = c(12, 9, 4),
    school = c("JHS", "Primary", "Creche")
    )
Table 6.2:
name sex age school
Ama Female 12 JHS
Yakubu Male 9 Primary
John Male 4 Creche

Below we first describe how to manually enter data into R. We aim to create a tibble by using the tibble function.

tibble(
    name = c("Ama", "Yakubu", "John"), 
    sex = c("Female", "Male", "Male"),
    age = c(12, 9, 4),
    school = c("JHS", "Primary", "Creche")
    )
Table 6.3:
name sex age school
Ama Female 12 JHS
Yakubu Male 9 Primary
John Male 4 Creche

6.3 R data file

When working in R, a frequent mode of storage of data is as an .Rdata file. This preserves the structure and environment of the data. Below we will read an already saved .Rdata file.

load(file = "./Data/data1.Rdata")
ls()
[1] "data1_stata" "mytheme"     "Oswego"     

We then visualise the first 4 rows of the single data within the loaded file called data1_stata

data1_stata %>% head(n=4)
Table 6.4:
id sex weight height
125 Male 7 64
62 Female 11 96
112 Female 13 115
29 Female 20 106

6.4 Text files

The first file format that we are going to read from is a flat file or text file. These usually have the extension .txt. The data in these files could be separated by various delimiters. These include tabs, commas, spaces, etc. In this section, we will read in one with a tab delimiter as a prototype as the rest will be similar.

read_delim(
    file = "./Data/bpA.txt", 
    delim = "\t", 
    col_types = c("c", "c", "d", "d")
    )
Table 6.5:
id sex age sbp0
B01 M 73 145
B02 F 47 164
B03 M 59 153

The last file to be read in this subsection is a comma-delimited text file

read_delim(
    file = "./Data/blood.txt", 
    delim = ",", 
    col_types = c("c", "d", "d", "d", "d", "d", "d")
    ) %>% 
    head(n=4)
Table 6.6:
stno age wgt hgt hb wbc hct
1001 120 18.4 128 7.7 13.7 22.6
1002 96 24   123 8   8.5 22.8
1003 168 38.5 143 8.2 26.8 24.1
1004 96 20.4 114 8.3 10.5 23.3

Comma-delimited files with extension .csv can also be imported with the commnands

read_csv(
    file = "./Data/blood.txt",
    col_types = c("c", "d", "d", "d", "d", "d", "d")) %>% 
    head(n=4)
Table 6.7:
stno age wgt hgt hb wbc hct
1001 120 18.4 128 7.7 13.7 22.6
1002 96 24   123 8   8.5 22.8
1003 168 38.5 143 8.2 26.8 24.1
1004 96 20.4 114 8.3 10.5 23.3

6.5 Microsoft Excel

Probably the most common format for transferring data is Microsoft Excel. There are two versions of Excel with extensions .xls and .xlsx. Below reading in the .xlsx is demonstrated using the readxl package.

readxl::read_xlsx(path = "./Data/data1.xlsx") %>% 
    head(n=4)
Table 6.8:
id sex weight height
125 Male 7 64
62 Female 11 96
112 Female 13 115
29 Female 20 106

6.6 SPSS files

Files from SPSS are usually saved with the extension of .sav. Below we read an SSPSS data file using the haven package

haven::read_sav(file = "./Data/data1.sav") %>% 
    head(n=4)
Table 6.9:
id sex weight height
125 Male 7 64
62 Female 11 96
112 Female 13 115
29 Female 20 106

6.7 Stata files

Stata files, similar to SPSS data files can be imported using the haven package. This is illustrated below

haven::read_dta(file = "./Data/data1.dta") %>% 
    head(n=4)
Table 6.10:
id sex weight height
125 Male 7 64
62 Female 11 96
112 Female 13 115
29 Female 20 106

6.8 SAS files

The haven package also offers the ability to read into R a SAS data file. This is illustrated below

haven::read_sas(data_file = "./Data/data1.sas7bdat") %>% 
    head(n=4)
Table 6.11:
ID SEX WEIGHT HEIGHT
125 Male 7 64
62 Female 11 96
112 Female 13 115
29 Female 20 106

6.9 Conclusion

In this section, we have learned how to import data into R from various file formats and programs. In the next section, we will learn about how to export data in R to other file formats.