Introduction
Leaving behind manual analysis tools
Goals:
Introduce programming concepts and the mindset behind programmatic workflows.
• Transition from manual tools (e.g. Excel & Prism) to programming-based workflows in R.
• Understand the concept of handling data using code.
• Learn to navigate directories and file paths programmatically.
• Emphasise reproducibility and structured workflows.
You will not be writing any code in this introduction. Instead, we will focus on reframing how you approach analytic programming. These ideas are tailored to researchers who manage a variety of tools — FlowJo, Excel, Prism, and more — but want to unify data wrangling, visualisation, and reporting under a single environment.
Transitioning from manual tools (e.g. Excel & Prism) to programming-based workflows in R.
Many resources teach the basics of R programming, but here we view things through the lens of a wet-lab researcher dealing with biomedical data. We will go beyond basic statistical methods or standard single-cell workflows and include topics like building a lab notebook in R and collaborating with PIs or peers.
If, however, all you need is a quick, “click-and-run” script to load already processed data, you might prefer resources such as this and that.
Note: If you still do not get it, don’t worry. It took me years to feel comfortable. I was fortunate enough to have a supervisor who gave me time to learn this. Keep up with it, find more resources and stick with a good community of users :)
Consistency
Consistency is key. You won’t get significantly more efficient without practice. You may find that when you routinely use a particular analysis, you can rewrite the entire workflow from memory. But after a break, it’s normal to forget key steps. Programming efficiency builds incrementally—there’s no shortcut.
Active Learning
Learning to code is like learning a foreign language: practicing by “speaking” (i.e. actually writing and running code) is far more effective than just reading. Do not limit yourself to reading code in a book—actively write and test it. One advantage of programming is that mistakes stay private to you and your computer. Embrace that safe space to experiment. Fortunately, we don’t have to embarrass ourselves when we try the wrong combination of words in programming (the only people who will know your silliness is you
Avoid Tutorial Overload
Be cautious about endless tutorials that don’t match your research needs. If example data is far removed from your work (like sports or finance data), you might struggle to see how it helps with, say, flow cytometry. Seek practice with examples relevant to your actual data and objectives.
Handling data - be specific!
Unlike with Excel or Prism, you won’t be double-clicking cells. If you want to change the value in the 3rd column and 5th row, you’ll need an explicit instruction like:
my_data[5,3] <- new_value
… and so you’ll need to learn quickly to be specific.
Eventually, you’ll learn to automate repetitive tasks, such as finding all rows containing “CD45RA” or “CD4.” Before long, you’ll understand that formatting data consistently—e.g. avoiding spaces in column names—saves you from constant reformatting or code breakage. An accidental space in something like Sample Number becomes Sample.Number, and your existing scripts may fail.
Before using R, I spent countless hours copy/pasting to get data formatted for Prism. If you’ve had that frustration, learning consistent data formatting for R can prevent a lot of manual effort.
I hope that as you navigate through these lessons, you try to amend your own spreadsheets so that your data becomes an R-friendly format.
Directories
Forget double-clicking in File explorer & Finder!
When you open RStudio, you’re essentially using a text editor (with syntax highlighting) that executes code. If R is pointed to the wrong folder, you may accidentally open or overwrite the wrong files.
Could you imagine thinking that you’re editing row 5 and column 3 of your raw data, and not realising that you’re editing the tissue log!
Directories are the most common errors! For both beginner and experienced users!
Set your directory at the beginning of each script/session!
For these sessions, I’ve created a folder on my desktop (mac) called analysis_user_group. We want to tell R where to look for files/folders and where to save our data/plots.
I can check my working directory using
[1] "/Users/thomasoneil/Desktop/analysis_user_group"
You can verify which files are present by using list.files()
Note on “.” and “..”
In a directory path, . means “this directory” and .. means “the parent directory.” For instance, list.files(“..”) will list files in the folder above your current one.
Reproducibility and automation
Tracking every minor change in a spreadsheet—removing rows, shifting columns, or redoing analyses in FlowJo—can be tedious. By contrast, each line of code in R is an explicit instruction, and you can save or version your script to see exactly what was done. Any time you need to adjust or repeat the analysis, you can simply run the script again.
Even better, you can automate many analyses. Once you have a working script, just change the folder path (and data, if needed), and you can generate a fresh report for new experiments. Combine multiple R scripts into a Markdown book to share with collaborators or keep for your records.
This alone makes programming worthwhile: reproducibility is seamless, and the clarity you gain over your data transformations is invaluable.