Chapter 3: Basic Programming
Goals:
Understand programming logic for automating tasks.
• Variables and assignments.
• Conditional statements (if, else).
• Loops (for, while).
• Writing functions.
Assigning variables
We’ve already introduced variable assignment, where a value is stored in a named object. For example, a <- 3 assigns the value 3 to a. We’ve also already assigned dataframes and lists to objects too.
This allows us to store and reuse information efficiently, reducing redundancy in our code. You can store anything you might want to use later, but remember that you will continuously take up more and more space with every object and variable you save.
Keep an eye on your environment. It will inflate quickly. You can use rm(object1, object2, objectn) to remove objects from your environment.
Conditional statements
Conditional statements allow you to control the flow of your code by executing different actions based on specific conditions.
Code
data$predominant_subset <- ifelse( # Unlike an if statement that evaluates a single condition, ifelse() is vectorized and applies the condition across an entire column.
data$CD8 > data$CD4, #Condition
"CD8", # if condition is met, this is the outcome
"CD4" # ELSE this is the outcome.
)
# get a quick summary of these results
table(data$predominant_subset, data$layer, data$tissue)## , , = Abdo
##
##
## Epithelium Underlying
## CD4 0 0
## CD8 1 1
##
## , , = abdomen
##
##
## Epithelium Underlying
## CD4 0 0
## CD8 0 1
##
## , , = Abdomen
##
##
## Epithelium Underlying
## CD4 5 0
## CD8 18 22
##
## , , = Labia
##
##
## Epithelium Underlying
## CD4 0 20
## CD8 20 0
##
## , , = Vagina
##
##
## Epithelium Underlying
## CD4 0 16
## CD8 16 0
You can see that there are several variations of the Abdomen tissue type, likely as a result of human input error when adding data to the csv. We’ll fix these below using loops.
Loops
There are two main types of loops in R:
- For loops: Used when the number of iterations is known. E.g. for indexes in 1 through 100, print which index you’re up to written as
for(index in 1:100){print(index)}
- While loops: Used when iterations depend on a condition. E.g. while the index is less than or equal to 100, print the index written as:
Here’s an example of a loop checking for inconsistent tissue labels.
Code
n_vagina = 0
n_labia = 0
n_abdo = 0
for(index in 1:nrow(data)){ #loop from 1 through nrows in data (1,2,3...121,120) and assign this value to 'index'
if(data$tissue[index] == "Abdomen"){ #check if the string at row[index] in the 'tissue' column matches abdomen.
n_abdo = n_abdo+1 #If so, add 1 to the variable.
} else if(data$tissue[index] == "Labia"){ # repeat
n_labia=n_labia+1
} else if(data$tissue[index] == "Vagina"){
n_vagina=n_vagina+1
} else {
print(paste(data$tissue[index], "at index",index,"does not match Abdomen, Labia or Vagina.")) # print a statement if the condition is met.
}
}## [1] "abdomen at index 24 does not match Abdomen, Labia or Vagina."
## [1] "Abdo at index 61 does not match Abdomen, Labia or Vagina."
## [1] "Abdo at index 62 does not match Abdomen, Labia or Vagina."
Code
## [1] "32 vagina samples. 40 labia samples. 45 abdomen samples. (Total: 117)"
We know that there are 120 samples, but here it’s printing 117, because there are 3 instances where Abdomen are not found (written abdomen, or Abdo). This will be a headache when generate plots or statistics. We could change these manutally, but I’ll write a function below which finds those errors and lets the user change the value.
Writing a function
Functions help modularize your code, reducing repetition and improving readability. Instead of rewriting similar blocks of code, you can define a function and call it whenever needed. For example, with single cell RNA analysis with Seurat, if you’re testing different variables, having the workflow summarized in one function will make your workflow tidier.
Here we’re just going to write the checker above into a function with some extra user-input functionality.
Code
sample_counter <- function(data){
n_vagina = 0
n_labia = 0
n_abdo = 0
for(index in 1:nrow(data)){ #loop from 1 through nrows in data (1,2,3...121,120) and assign this value to 'index'
if(data$tissue[index] == "Abdomen"){ #check if the string at row[index] in the 'tissue' column matches abdomen.
n_abdo = n_abdo+1 #If so, add 1 to the variable.
} else if(data$tissue[index] == "Labia"){
n_labia=n_labia+1
} else if(data$tissue[index] == "Vagina"){
n_vagina=n_vagina+1
} else {
print(paste(data$tissue[index], "at index",index,"does not match Abdomen, Labia or Vagina."))
checkval=0
while(checkval==0){
input = readline("Choose correct value: [1] Abdomen, [2] Labia, [3] Vagina. ")
if(input %in% c(1:3)) {
if(input==1){
n_abdo = n_abdo+1
data$tissue[index] = "Abdomen"
} else if(input==2) {
n_labia=n_labia+1
data$tissue[index] = "Labia"
} else if(input==3) {
n_vagina=n_vagina+1
data$tissue[index] = "Vagina"
}
checkval = 1 #exit the while loop
} else {
cat("\nIncorrect input. Try again.\n")
}
}
}
}
print(paste0(n_vagina, " vagina samples. ", n_labia, " labia samples. ", n_abdo, " abdomen samples. (Total: ",n_abdo+n_labia+n_vagina,")" ))
return(data)
}Code
Throughout your analysis, you’ll find more relevant for loops, while loops and functions that make you more efficient at programming.