Chapter 2: Data Types and Structures
Goals:
Learn about data types and structures in R.
• Basic data types: numeric, character, logical.
• Data structures: vectors, matrices, data frames, lists.
• Assessing and querying data types and structures.
• Importing data (RDS, Excel, CSV files).
Data Types
There are a few data types you should know:
character: In R, characters or strings are letters, words, sentences, numbers, symbols… They’re defined using quotation marks.
numeric: Data can be stored as a number. R recognises numbers without the quotation marks
logical: TRUE / FALSE / T / F / 1 / 0. Logicals are an important data type that you’ll notice along the way.
R treats 1 as TRUE and 0 as FALSE
Some functions or operations require data to be of the correct type. For example, mathematical operations can’t be performed on a character.
## [1] "character"
## [1] "character"
## [1] "double"
## [1] "logical"
A good example of using logicals is to check a condition and execute a script. I’ll use if here, which we cover in the next chapter.
Examples:
## [1] "run_script is TRUE"
## [1] "run_script is TRUE"
## [1] "run_script is 1"
Code
Above, we set the variables deliberately to demonstrate how they can be used. However, they don’t need to be deliberately set to be useful.
## [1] "double"
## [1] TRUE
## [1] "logical"
## [1] "The variable a (3) is numeric."
Code
## [1] "The variable a (3) is character. Not continuing the analysis."
Data Structures
Data types can be organised into structures. These are:
vectors: A vector is a collection of data, such as c(1990, 1990, 1991, 1989) or c(“Rou”, “Chris”, “Rory”, “Rob”)
matrix: A matrix is data organised into rows and columns. Data in a matrix is only of one data type.
data frames: A data frame is similar to a matrix, but can contain different data types.
lists: A list is dynamic and can hold all different types of data structures and data types.
Code
## [1] 4 3
We can see this dataframe is 4 rows tall and 3 columns wide.
We can access information in a dataframe using: df[row, col]
## [1] 1990
With data frames, we can also use $ to query a specific column.
## [1] "Rou" "Chris" "Rory" "Rob"
More examples:
## [1] 1990 1990 1991 1989
## [1] "Rou" "Chris" "Rory" "Rob"
We can do a lot with a data frame if we combine everything we know about data types and structures
Code
## [1] TRUE FALSE FALSE FALSE
Code
When working with enormous data structures, like single cell sequencing or proteomics data, knowing how to query your data frame or matrix will be very useful!
Finally, we can make a list out of all of these different data types and structures,
Code
## $number
## [1] 1
##
## $string
## [1] "string"
##
## $logical
## [1] TRUE
##
## $vector
## [1] "Rou" "Chris" "Rory" "Rob"
##
## $dataframe
## year names ratings is_rou
## 1 1990 Rou 4 TRUE
## 2 1990 Chris 4 FALSE
## 3 1991 Rory 4 FALSE
## 4 1989 Rob 5 FALSE
Ways to query your list
## [1] 1
## [1] "Rou" "Chris" "Rory" "Rob"
## [1] "string"
Its also worth noting that lists can contain other lists.
Importing and saving files in R
Most file types can be read into R. If you’re not working with xlsx or csv, you might be working with tsv, txt, h5ad, or xml. Some are easier than others, and you’ll learn them as you go.
Another file type is RDS which is R Data Serialization which is the easiest and most efficient way to read and write data once you’ve got it in R.
RDS is more efficient than csv because saving and writing a csv is not as standard as you think. For example, a column name might change, rownames don’t stick unless directly specified when reading it back in.
On the other hand, saveRDS() & readRDS() requires no extra information. Your data will be identical when read back in.
Most commonly, you’ll use either
data <- read.csv() # For CSV files
data <- readRDS() # For RDS files
To read in excel files, you’ll need to install a package (try the packages xlsx for read.xlsx(), or readxl for read_excel() - I can’t recall why, but sometimes one of these won’t work).
Throughout these tutorials, you’ll use read.csv() to read in the example data we downloaded in Chapter 1. We’ll then finish each tutorial with saveRDS() and begin the next tutorial with readRDS(). This is a nice and organised way that you may use yourself when analysing data. This is effectively having stop and start points.