6 Importing data into R

In this chapter, we discuss ways of getting data into R, either by directly entering it into R or importing it from another software. In R, a data frame is the data structure desirable for data analysis. With the advent of the tidy data, a tibble is now the predominant data structure being used. For this section, we will be reading in various file formats and presenting them as a tibble.

6.1 Using data in R packages

Many packages in R come with data that can be used for practice. To be able to use a dataset in a specific package, that package first has to be installed. For instance, to be able to use the data Oswego native to the package epiDisplay, we first ensure the package is installed by running the command line below:

install.packages("epiDisplay")

Note that this will install the package and any other packages that the epiDisplay package depends on.

The next step will be to make the data available in the R session as below

data("Oswego", package = "epiDisplay")

Now that the data is available in the working environment, we can visualise the first 6 rows below

Oswego %>% head()

Table 6.1:
age	sex	timesupper	ill	onsetdate	onsettime	bakedham	spinach	mashedpota	cabbagesal	jello	rolls	brownbread	milk	coffee	water	cakes	vanilla	chocolate	fruitsalad
11	M		FALSE			FALSE	FALSE	FALSE	FALSE	FALSE	FALSE	FALSE	FALSE	FALSE	FALSE	FALSE	FALSE	TRUE	FALSE
52	F	2e+03	TRUE	04/19	30	TRUE	TRUE	TRUE	FALSE	FALSE	TRUE	FALSE	FALSE	TRUE	FALSE	FALSE	TRUE	FALSE	FALSE
65	M	1.83e+03	TRUE	04/19	30	TRUE	TRUE	TRUE	TRUE	FALSE	FALSE	FALSE	FALSE	TRUE	FALSE	FALSE	TRUE	TRUE	FALSE
59	F	1.83e+03	TRUE	04/19	30	TRUE	TRUE	FALSE	FALSE	FALSE	FALSE	FALSE	FALSE	TRUE	FALSE	TRUE	TRUE	TRUE	FALSE
13	F		FALSE			FALSE	FALSE	FALSE	FALSE	FALSE	FALSE	FALSE	FALSE	FALSE	FALSE	FALSE	FALSE	TRUE	FALSE
63	F	1.93e+03	TRUE	04/18	2.23e+03	TRUE	TRUE	FALSE	TRUE	TRUE	FALSE	FALSE	FALSE	FALSE	TRUE	FALSE	TRUE	FALSE	FALSE

And finally visualise the details of the data by using the help() function as below

help(Oswego)

6.2 Direct entry into R

We first use the data.frame() function from the base package to create the data frame.

data.frame(
    name = c("Ama", "Yakubu", "John"), 
    sex = c("Female", "Male", "Male"),
    age = c(12, 9, 4),
    school = c("JHS", "Primary", "Creche")
    )

Table 6.2:
name	sex	age	school
Ama	Female	12	JHS
Yakubu	Male	9	Primary
John	Male	4	Creche

Below we first describe how to manually enter data into R. We aim to create a tibble by using the tibble function.

tibble(
    name = c("Ama", "Yakubu", "John"), 
    sex = c("Female", "Male", "Male"),
    age = c(12, 9, 4),
    school = c("JHS", "Primary", "Creche")
    )

Table 6.3:
name	sex	age	school
Ama	Female	12	JHS
Yakubu	Male	9	Primary
John	Male	4	Creche

6.3 R data file

When working in R, a frequent mode of storage of data is as an .Rdata file. This preserves the structure and environment of the data. Below we will read an already saved .Rdata file.

load(file = "./Data/data1.Rdata")
ls()
[1] "data1_stata" "mytheme"     "Oswego"

We then visualise the first 4 rows of the single data within the loaded file called data1_stata

data1_stata %>% head(n=4)

Table 6.4:
id	sex	weight	height
125	Male	7	64
62	Female	11	96
112	Female	13	115
29	Female	20	106

6.4 Text files

The first file format that we are going to read from is a flat file or text file. These usually have the extension .txt. The data in these files could be separated by various delimiters. These include tabs, commas, spaces, etc. In this section, we will read in one with a tab delimiter as a prototype as the rest will be similar.

read_delim(
    file = "./Data/bpA.txt", 
    delim = "\t", 
    col_types = c("c", "c", "d", "d")
    )

Table 6.5:
id	sex	age	sbp0
B01	M	73	145
B02	F	47	164
B03	M	59	153

The last file to be read in this subsection is a comma-delimited text file

read_delim(
    file = "./Data/blood.txt", 
    delim = ",", 
    col_types = c("c", "d", "d", "d", "d", "d", "d")
    ) %>% 
    head(n=4)

Table 6.6:
stno	age	wgt	hgt	hb	wbc	hct
1001	120	18.4	128	7.7	13.7	22.6
1002	96	24	123	8	8.5	22.8
1003	168	38.5	143	8.2	26.8	24.1
1004	96	20.4	114	8.3	10.5	23.3

Comma-delimited files with extension .csv can also be imported with the commnands

read_csv(
    file = "./Data/blood.txt",
    col_types = c("c", "d", "d", "d", "d", "d", "d")) %>% 
    head(n=4)

Table 6.7:
stno	age	wgt	hgt	hb	wbc	hct
1001	120	18.4	128	7.7	13.7	22.6
1002	96	24	123	8	8.5	22.8
1003	168	38.5	143	8.2	26.8	24.1
1004	96	20.4	114	8.3	10.5	23.3

6.5 Microsoft Excel

Probably the most common format for transferring data is Microsoft Excel. There are two versions of Excel with extensions .xls and .xlsx. Below reading in the .xlsx is demonstrated using the readxl package.

readxl::read_xlsx(path = "./Data/data1.xlsx") %>% 
    head(n=4)

Table 6.8:
id	sex	weight	height
125	Male	7	64
62	Female	11	96
112	Female	13	115
29	Female	20	106

6.6 SPSS files

Files from SPSS are usually saved with the extension of .sav. Below we read an SSPSS data file using the haven package

haven::read_sav(file = "./Data/data1.sav") %>% 
    head(n=4)

Table 6.9:
id	sex	weight	height
125	Male	7	64
62	Female	11	96
112	Female	13	115
29	Female	20	106

6.7 Stata files

Stata files, similar to SPSS data files can be imported using the haven package. This is illustrated below

haven::read_dta(file = "./Data/data1.dta") %>% 
    head(n=4)

Table 6.10:
id	sex	weight	height
125	Male	7	64
62	Female	11	96
112	Female	13	115
29	Female	20	106

6.8 SAS files

The haven package also offers the ability to read into R a SAS data file. This is illustrated below

haven::read_sas(data_file = "./Data/data1.sas7bdat") %>% 
    head(n=4)

Table 6.11:
ID	SEX	WEIGHT	HEIGHT
125	Male	7	64
62	Female	11	96
112	Female	13	115
29	Female	20	106

6.9 Conclusion

In this section, we have learned how to import data into R from various file formats and programs. In the next section, we will learn about how to export data in R to other file formats.

5 Data Structures in R

7 Data wrangling