Chapter 3: Basic Programming

Goals:
Understand programming logic for automating tasks.
  • Variables and assignments.
  • Conditional statements (if, else).
  • Loops (for, while).
  • Writing functions.

Read time: 10 Minutes

Assigning variables

We’ve already introduced variable assignment, where a value is stored in a named object. For example, a <- 3 assigns the value 3 to a. We’ve also already assigned dataframes and lists to objects too.

This allows us to store and reuse information efficiently, reducing redundancy in our code. You can store anything you might want to use later, but remember that you will continuously take up more and more space with every object and variable you save.

Code
setwd("directory")
Code
data <- read.csv("~/Desktop/analysis-user-group/synthetic_data.csv")

Keep an eye on your environment. It will inflate quickly. You can use rm(object1, object2, objectn) to remove objects from your environment.

Conditional statements

Conditional statements allow you to control the flow of your code by executing different actions based on specific conditions.

Code
data$predominant_subset <- ifelse( # Unlike an if statement that evaluates a single condition, ifelse() is vectorized and applies the condition across an entire column.
  data$CD8 > data$CD4, #Condition
  "CD8", # if condition is met, this is the outcome
  "CD4" # ELSE this is the outcome. 
)

# get a quick summary of these results
table(data$predominant_subset, data$layer, data$tissue)
## , ,  = Abdo
## 
##      
##       Epithelium Underlying
##   CD4          0          0
##   CD8          1          1
## 
## , ,  = abdomen
## 
##      
##       Epithelium Underlying
##   CD4          0          0
##   CD8          0          1
## 
## , ,  = Abdomen
## 
##      
##       Epithelium Underlying
##   CD4          5          0
##   CD8         18         22
## 
## , ,  = Labia
## 
##      
##       Epithelium Underlying
##   CD4          0         20
##   CD8         20          0
## 
## , ,  = Vagina
## 
##      
##       Epithelium Underlying
##   CD4          0         16
##   CD8         16          0

You can see that there are several variations of the Abdomen tissue type, likely as a result of human input error when adding data to the csv. We’ll fix these below using loops.

Loops

There are two main types of loops in R:
- For loops: Used when the number of iterations is known. E.g. for indexes in 1 through 100, print which index you’re up to written as

for(index in 1:100){print(index)}
  • While loops: Used when iterations depend on a condition. E.g. while the index is less than or equal to 100, print the index written as:
Code
index=1
while(index <=100){
  print(index)
  index=index+1
}

Here’s an example of a loop checking for inconsistent tissue labels.

Code
n_vagina = 0
n_labia = 0
n_abdo = 0

for(index in 1:nrow(data)){ #loop from 1 through nrows in data (1,2,3...121,120) and assign this value to 'index'
  if(data$tissue[index] == "Abdomen"){ #check if the string at row[index] in the 'tissue' column matches abdomen. 
    n_abdo = n_abdo+1 #If so, add 1 to the variable. 
  } else if(data$tissue[index] == "Labia"){ # repeat
    n_labia=n_labia+1
  } else if(data$tissue[index] == "Vagina"){
    n_vagina=n_vagina+1
  } else {
    print(paste(data$tissue[index], "at index",index,"does not match Abdomen, Labia or Vagina.")) # print a statement if the condition is met.
  }
}
## [1] "abdomen at index 24 does not match Abdomen, Labia or Vagina."
## [1] "Abdo at index 61 does not match Abdomen, Labia or Vagina."
## [1] "Abdo at index 62 does not match Abdomen, Labia or Vagina."
Code
print(paste0(n_vagina, " vagina samples. ", n_labia, " labia samples. ", n_abdo, " abdomen samples. (Total: ",n_abdo+n_labia+n_vagina,")" ))
## [1] "32 vagina samples. 40 labia samples. 45 abdomen samples. (Total: 117)"

We know that there are 120 samples, but here it’s printing 117, because there are 3 instances where Abdomen are not found (written abdomen, or Abdo). This will be a headache when generate plots or statistics. We could change these manutally, but I’ll write a function below which finds those errors and lets the user change the value.

Writing a function

Functions help modularize your code, reducing repetition and improving readability. Instead of rewriting similar blocks of code, you can define a function and call it whenever needed. For example, with single cell RNA analysis with Seurat, if you’re testing different variables, having the workflow summarized in one function will make your workflow tidier.

Here we’re just going to write the checker above into a function with some extra user-input functionality.

Code
sample_counter <- function(data){
  n_vagina = 0
  n_labia = 0
  n_abdo = 0
  
  for(index in 1:nrow(data)){ #loop from 1 through nrows in data (1,2,3...121,120) and assign this value to 'index'
    if(data$tissue[index] == "Abdomen"){ #check if the string at row[index] in the 'tissue' column matches abdomen. 
      n_abdo = n_abdo+1 #If so, add 1 to the variable. 
    } else if(data$tissue[index] == "Labia"){
      n_labia=n_labia+1
    } else if(data$tissue[index] == "Vagina"){
      n_vagina=n_vagina+1
    } else {
      print(paste(data$tissue[index], "at index",index,"does not match Abdomen, Labia or Vagina."))
      checkval=0
      while(checkval==0){
        input = readline("Choose correct value: [1] Abdomen, [2] Labia, [3] Vagina. ")
        if(input %in% c(1:3)) {
           if(input==1){
             n_abdo = n_abdo+1
             data$tissue[index] = "Abdomen"
           } else if(input==2) {
              n_labia=n_labia+1
              data$tissue[index] = "Labia"
  
           } else if(input==3) {
              n_vagina=n_vagina+1
              data$tissue[index] = "Vagina"
           }
           checkval = 1 #exit the while loop
        } else {
          cat("\nIncorrect input. Try again.\n")
        }
      }
    }
  }
  print(paste0(n_vagina, " vagina samples. ", n_labia, " labia samples. ", n_abdo, " abdomen samples. (Total: ",n_abdo+n_labia+n_vagina,")" ))
  return(data)
}
Code
data <- sample_counter(data)
1
1
1
write.csv(data, "~/Desktop/analysis-user-group/synthetic_data.csv")

Throughout your analysis, you’ll find more relevant for loops, while loops and functions that make you more efficient at programming.

Next chapter →