Help me help you!

An introduction to R debugging

David Munoz Tord - Senior R Sorcerer

We Data

May 5, 2026

Introduction

Detective

Debugging is a Skill, Not Magic


  • When code breaks: It’s not a against, it’s a crime scene with clues
  • Red text is helpful: R is telling you exactly where things went wrong
  • Systematic approach: We have three tools to solve any mystery

Detective

WebR Shell

1. Direct Intervention: browser()


Detective

Freezing Time


  • Silent bugs are the hardest: When code runs but gives wrong results
  • browser() freezes time: You can inspect the environment mid-execution

Code debugging

  • Step through code: Execute line by line and watch variables change
  • Print at will: Check what any variable contains at that exact moment

Browser Commands Reference


Command What it does
n Execute next line
f Finish execution next line
s Step into function
c Continue execution
where Print current location
ls() List variables in environment
print(variable_name) Check a variable’s value
Q Quit debugger

Live Demo: Browsing the Code


calculate_final_price <- function(base_price, discount) {
  browser()  # freeze!
  discount_amount <- base_price * (discount / 100)
  final <- base_price + discount_amount  # BUG
  return(final)
}

calculate_final_price(200, 15)
# Now you can type commands:
# > base_price
# > discount  
# > n  (to step to next line)

Set Breakpoints

Essentially acts as browser() but without modifying your code.

Code debugging

Browsing in a nested function


calculate_fee <- function(amount) {
  fee <- amount * 0.05
  amount + fee
}

process_transactions <- function(transactions) {
  results <- numeric(length(transactions))
  for (i in seq_along(transactions)) {
    browser()
    current_val <- transactions[i]
    if (current_val > 100) {
      results[i] <- calculate_fee(current_val)
    } else {
      results[i] <- current_val
    }
  }
  results
}

sales <- c(50, 150)
process_transactions(sales)

2. Finding the Clues with traceback()

Code debugging

The Paper Trail of Errors


  • Error messages aren’t gibberish: R is showing you a paper trail
  • traceback() shows the chain: Which function called which, leading to the crash
  • Read from bottom up: The bottom of the stack is where the actual error happened

Magnifying glass over code

Live Demo: The Traceback


clean_data <- function(data) {
  na.omit(data)
}

process_numbers <- function(x) {
  clean_data(x) + "10"
}

run_analysis <- function(val) {
  process_numbers(val)
}

run_analysis(c(1, 2, 3))
# > Error in clean_data(x) + "10" : non-numeric argument to binary operator
# > traceback()
# > 2: process_numbers(val) at #2
# > 1: run_analysis(c(1, 2, 3))

A note on RStudio “On Error”

Halt on error: 3 modes: None, Traceback, Debug

  • Message Only: Just shows the error message, no stack info
  • Error Inspector: Gives you button to see traceback, but you have to click it
  • Break in Code: Enters debug mode immediately on error, allow you to inspect variables

Magnifying glass over code

Pro tip: Use “Break in Code” to get hands-on experience with debugging.

On VSCode and Positron

  • VSCode R extension supports debugging

  • Positron offers similar

  • A bit tricky to set up …

  • You get a full fledged debugging experience with breakpoints, variable inspection, and more..

Magnifying glass over code

Catching Errors Early: debug() and debugonce()


debug() - Persistent Debugging

# Set debug mode for a function
debug(run_analysis)

# enter debug mode
run_analysis(c(1, 2, 3))

# Later turn it off
undebug(run_analysis)

Use when:

  • You want to inspect a function’s internals

  • Need to trace variables through multiple calls

debugonce() - One-Time Debugging

# Debug the next call only
debugonce(clean_data)

# First call enters debug mode
clean_data(c(1, 2, 3))

# Second call runs normally
clean_data(c(1, 2, 3))

Use when:

  • You only want to inspect one specific call

  • Testing different arguments without full debug

Advanced: Conditional Debugging

# Example: Debug only when a condition is true
for (i in 1:1000) {
  result <- process_data(data[i])
  
  # Enter debug mode only for problematic rows
  if (is.na(result)) {
    browser()  # Freezes only when result is NA
  }
}

# Or with debug() for specific function calls:
if (my_condition) {
  debugonce(problematic_function)
}

Benefits:

  • Avoid stepping through 1000 iterations manually

  • Target only the rows/cases that fails!

  • Combine browser() with conditions for precision

Case Study: The Black Box


Magnifying glass over code

The Setup Scene


my_model <- lm(salary ~ education + experience + city, data = survey_data)
# > Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
# >   contrasts can be applied only to factors with 2 or more levels

Step 1: The traceback()


Running traceback() shows us the internal guts of the lm() function:

traceback()
# > 5: stop("contrasts can be applied only to factors with 2 or more levels")
# > 4: `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]])
# > 3: model.matrix.default(mt, mf, contrasts)
# > 2: model.matrix(mt, mf, contrasts)
# > 1: lm(salary ~ education + experience + city, data = survey_data)

What we know:

  • lm() called model.matrix(), which crashed on step 4.

What we don’t know:

  • Which column in our data caused model.matrix to choke.

Step 2: options(error = recover)


When you can’t put browser() in the code, you change R’s global rules.

Same thing as break in code in RStudio debug mode


options(error = recover)

my_model <- lm(salary ~ education + experience + city, data = survey_data)

Step 3: Entering the Matrix


R immediately pauses and gives you a menu of the call stack.


# Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
#   contrasts can be applied only to factors with 2 or more levels

# Enter a frame number, or 0 to exit   
# 1: lm(salary ~ education + experience + city, data = survey_data)
# 2: model.matrix(mt, mf, contrasts)
# 3: model.matrix.default(mt, mf, contrasts)
# 4: `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]])

# Selection: 3

Step 4: The Interrogation


You are now inside the package’s environment. You can check the variables the package author was using.


# Browse[1]> str(data)
# 'data.frame':   5 obs. of  4 variables:
#  $ salary    : num  65000 72000 58000 81000 90000
#  $ education : Factor w/ 3 levels "Bachelors","Masters",..: 1 2 1 3 2
#   ..- attr(*, "contrasts")= chr "contr.treatment"
#  $ experience: num  3 5 1 8 10
#  $ city      : Factor w/ 1 level "New York": 1 1 1 1 1

# Browse[1]> Q
# > options(error = NULL)

3: Help Me Help You with reprex()

Code debugging

Creating a Reproducible Example (reprex)

The Philosophy: A reprex is a minimal, self-contained code example of your problem.

library(reprex)

my_numbers <- c(10, 20, 30)
mean(my_numbers, na.rm = TRUE)
my_numbers * "two"

reprex()
# output ready for GitHub or StackOverflow

reprex


Be .. Zen !

When creating a reprex, you’re forced to:

  1. Remove the noise - What’s truly essential?
  2. Simplify your data - Use 5 rows instead of 5 million
  3. Recreate the error - Now it’s visible and clear
  4. Share confidently - It’s minimal and reproducible

dput()

Easy to create a reprex with your actual data structure using dput()

dput is very accurate because it captures the exact structure of your data, including factors, levels, and classes.

data(penguins)
penguins |>
  head(2) |>
  dput()
# structure(list(species = structure(c(1L, 1L), levels = c("Adelie", 
# "Chinstrap", "Gentoo"), class = "factor"), island = structure(c(3L, 
# 3L), levels = c("Biscoe", "Dream", "Torgersen"), class = "factor"), 
#     bill_length_mm = c(39.1, 39.5), bill_depth_mm = c(18.7, 17.4
#     ), flipper_length_mm = c(181L, 186L), body_mass_g = c(3750L, 
#     3800L), sex = structure(2:1, levels = c("female", "male"), class = "factor"), 
#     year = c(2007L, 2007L)), row.names = c(NA, -2L), class = c("tbl_df", 
# "tbl", "data.frame"))

The New Kid in Town: datapasta

Why Datapasta is Amazing

  • Copy-paste magic: Grab data from anywhere, paste as R code
  • Instant reproducibility: No more manual data entry
  • Multiple formats: Works with spreadsheets, tables, and text


datapasta


Pasting from the Web

X Location Min Max
Partly cloudy. Brisbane 19 29
Partly cloudy. Brisbane Airport 18 27
Possible shower. Beaudesert 15 30
Partly cloudy. Redcliffe 19 27

Copy this table from the web and paste with datapasta

# After copying the table, run:
datapasta::tribble_paste()
# tibble::tribble( ~X,          ~Location, ~Min, ~Max,
#     "Partly cloudy.",         "Brisbane",  19L,  29L,
#     "Partly cloudy.", "Brisbane Airport",  18L,  27L,
#   "Possible shower.",       "Beaudesert",  15L,  30L,
#     "Partly cloudy.",        "Redcliffe",  19L,  27L
#   )

The AI/LLM Era: Why This Matters More Than Ever

Why Reprex

  • LLMs need actual data, not descriptions
  • They analyze structure and values to find bugs
  • Problems become solvable much faster

The difference:

❌ Vague: “My data won’t merge. I have two datasets.”

✅ Precise: [Actual data structure from data + reproducible code]

For GitHub & Community: - Maintainers can run your exact code immediately - Issues get resolved faster

Case Study


  • The Problem: You want a correlation matrix, but the cor() function crashes instantly.
  • The Challenge: The dataset is 54,000 rows long. You can’t paste that into a forum.

The Setup


library(ggplot2)

# We load the massive diamonds dataset
dim(diamonds)
# [1] 53940    10

cor(diamonds)
# > Error in cor(diamonds) : 'x' must be numeric

Step 1: Recreating the data


Instead of inventing fake data, we ask R to extract the exact structural DNA of just the first 3 rows.

data(diamonds)
diamonds |>
  head(3) |>
  dput()
# structure(list(carat = c(0.23, 0.21, 0.23), cut = structure(c(5L, 
# 4L, 2L), class = c("ordered", "factor"), levels = c("Fair", "Good", 
# "Very Good", "Premium", "Ideal")), color = structure(c(2L, 2L, 
# 2L), class = c("ordered", "factor"), levels = c("D", "E", "F", 
# "G", "H", "I", "J")), clarity = structure(c(2L, 3L, 5L), class = c("ordered", 
# "factor"), levels = c("I1", "SI2", "SI1", "VS2", "VS1", "VVS2", 
# "VVS1", "IF")), depth = c(61.5, 59.8, 56.9), table = c(55, 61, 
# 65), price = c(326L, 326L, 327L), x = c(3.95, 3.89, 4.05), y = c(3.98, 
# 3.84, 4.07), z = c(2.43, 2.31, 2.31)), row.names = c(NA, -3L), class = c("tbl_df", 
# "tbl", "data.frame"))

Step 2: Bundle it with reprex()


Now, we copy the output from dput() and combine it with our broken code. We highlight it, copy it to our clipboard, and type reprex::reprex() in the console.

# Data generated perfectly by dput()
tiny_diamonds <- structure(list(carat = c(0.23, 0.21, 0.23), cut = ordered(c(5L, 4L, 2L), 
  levels = c("Fair", "Good", "Very Good", "Premium", "Ideal")), color = ordered(c(2L, 2L, 2L), 
  levels = c("D", "E", "F", "G", "H", "I", "J")), clarity = ordered(c(2L, 3L, 5L), 
  levels = c("I1", "SI2", "SI1", "VS2", "VS1", "VVS2", "VVS1", "IF")), depth = c(61.5, 59.8, 56.9), 
  table = c(55, 61, 65), price = c(326L, 326L, 327L), x = c(3.95, 3.89, 4.05), 
  y = c(3.98, 3.84, 4.07), z = c(2.43, 2.31, 2.31)), row.names = c(NA, -3L), 
  class = c("tbl_df", "tbl", "data.frame"))

cor(tiny_diamonds)

#first copy this above and the run reprex::reprex() and then sessionInfo()

Note: Also never forget to give you session info in your reprex, so that people know which version of R and packages you are using. You can do this with sessionInfo().

Note on Anonymization

We can use FakeDataR to generate synthetic data that mimics the structure of our original dataset.

library(FakeDataR)

# generate fake data maintaining exact column structures and factors
safe_diamonds <- generate_fake_data(
  data = diamonds,
  n = 3,
  seed = 123,
  category_mode = "generic",
  numeric_mode = "range",
  column_mode = "keep",
  sensitive_detect = TRUE,
  normalize = TRUE
)

dput(safe_diamonds)

The Modern Debugging Workflow


  1. Encounter problem
  2. Use your debugging tools
  3. Create reprex (with data)
  4. Share to LLM or GitHub


You’re not just helping others, you’re helping yourself.

When I had to create an issue on the ggplot2 repository

Your Complete Debugging Arsenal


The Complete Dev’s Toolkit

  1. debug() / debugonce() – Step into functions from the start
  2. browser() – Freeze time mid-execution and inspect
  3. traceback() – Find where the chain of calls broke
  4. reprex() – Create minimal, shareable examples (with data!)

Remember:


  • Don’t panic at red text.. it’s a clue!

  • Every tool serves a different purpose in your arsenal

  • Combine tools for maximum debugging power

  • In the AI era, a good reprex is your superpower

Resources


Q&A


Thank You for Your Attention!


We Data Acknowledgements:

  • Fabrice Hategekimana
  • Vestin Hategekimana