Help me help you!

An introduction to R debugging

David Munoz Tord - Senior R Sorcerer

david.munoztord@mailbox.org

We Data

May 5, 2026

Introduction

Debugging is a Skill, Not Magic

When code breaks: It’s not a against, it’s a crime scene with clues
Red text is helpful: R is telling you exactly where things went wrong
Systematic approach: We have three tools to solve any mystery

WebR Shell

WebR Shell: https://webr.sh/

1. Direct Intervention: `browser()`

Detective

Freezing Time

Silent bugs are the hardest: When code runs but gives wrong results
browser() freezes time: You can inspect the environment mid-execution

Code debugging

Step through code: Execute line by line and watch variables change
Print at will: Check what any variable contains at that exact moment

Browser Commands Reference

Command	What it does
`n`	Execute next line
`f`	Finish execution next line
`s`	Step into function
`c`	Continue execution
`where`	Print current location
`ls()`	List variables in environment
`print(variable_name)`	Check a variable’s value
`Q`	Quit debugger

Live Demo: Browsing the Code

calculate_final_price <- function(base_price, discount) {
  browser()  # freeze!
  discount_amount <- base_price * (discount / 100)
  final <- base_price + discount_amount  # BUG
  return(final)
}

calculate_final_price(200, 15)
# Now you can type commands:
# > base_price
# > discount  
# > n  (to step to next line)

Set Breakpoints

Essentially acts as browser() but without modifying your code.

Code debugging

Browsing in a nested function

calculate_fee <- function(amount) {
  fee <- amount * 0.05
  amount + fee
}

process_transactions <- function(transactions) {
  results <- numeric(length(transactions))
  for (i in seq_along(transactions)) {
    browser()
    current_val <- transactions[i]
    if (current_val > 100) {
      results[i] <- calculate_fee(current_val)
    } else {
      results[i] <- current_val
    }
  }
  results
}

sales <- c(50, 150)
process_transactions(sales)

2. Finding the Clues with `traceback()`

Code debugging

The Paper Trail of Errors

Error messages aren’t gibberish: R is showing you a paper trail
traceback() shows the chain: Which function called which, leading to the crash
Read from bottom up: The bottom of the stack is where the actual error happened

Magnifying glass over code

Live Demo: The Traceback

clean_data <- function(data) {
  na.omit(data)
}

process_numbers <- function(x) {
  clean_data(x) + "10"
}

run_analysis <- function(val) {
  process_numbers(val)
}

run_analysis(c(1, 2, 3))
# > Error in clean_data(x) + "10" : non-numeric argument to binary operator
# > traceback()
# > 2: process_numbers(val) at #2
# > 1: run_analysis(c(1, 2, 3))

A note on RStudio “On Error”

Halt on error: 3 modes: None, Traceback, Debug

Message Only: Just shows the error message, no stack info
Error Inspector: Gives you button to see traceback, but you have to click it
Break in Code: Enters debug mode immediately on error, allow you to inspect variables

Magnifying glass over code

Pro tip: Use “Break in Code” to get hands-on experience with debugging.

On VSCode and Positron

VSCode R extension supports debugging
Positron offers similar
A bit tricky to set up …
You get a full fledged debugging experience with breakpoints, variable inspection, and more..

Magnifying glass over code

Catching Errors Early: `debug()` and `debugonce()`

`debug()` - Persistent Debugging

# Set debug mode for a function
debug(run_analysis)

# enter debug mode
run_analysis(c(1, 2, 3))

# Later turn it off
undebug(run_analysis)

Use when:

You want to inspect a function’s internals
Need to trace variables through multiple calls

`debugonce()` - One-Time Debugging

# Debug the next call only
debugonce(clean_data)

# First call enters debug mode
clean_data(c(1, 2, 3))

# Second call runs normally
clean_data(c(1, 2, 3))

Use when:

You only want to inspect one specific call
Testing different arguments without full debug

Advanced: Conditional Debugging

# Example: Debug only when a condition is true
for (i in 1:1000) {
  result <- process_data(data[i])
  
  # Enter debug mode only for problematic rows
  if (is.na(result)) {
    browser()  # Freezes only when result is NA
  }
}

# Or with debug() for specific function calls:
if (my_condition) {
  debugonce(problematic_function)
}

Benefits:

Avoid stepping through 1000 iterations manually
Target only the rows/cases that fails!
Combine browser() with conditions for precision

Case Study: The Black Box

Magnifying glass over code

The Setup Scene

my_model <- lm(salary ~ education + experience + city, data = survey_data)
# > Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
# >   contrasts can be applied only to factors with 2 or more levels

Step 1: The `traceback()`

Running traceback() shows us the internal guts of the lm() function:

traceback()
# > 5: stop("contrasts can be applied only to factors with 2 or more levels")
# > 4: `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]])
# > 3: model.matrix.default(mt, mf, contrasts)
# > 2: model.matrix(mt, mf, contrasts)
# > 1: lm(salary ~ education + experience + city, data = survey_data)

What we know:

lm() called model.matrix(), which crashed on step 4.

What we don’t know:

Which column in our data caused model.matrix to choke.

Step 2: `options(error = recover)`

When you can’t put browser() in the code, you change R’s global rules.

Same thing as break in code in RStudio debug mode

options(error = recover)

my_model <- lm(salary ~ education + experience + city, data = survey_data)

Step 3: Entering the Matrix

R immediately pauses and gives you a menu of the call stack.

# Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
#   contrasts can be applied only to factors with 2 or more levels

# Enter a frame number, or 0 to exit   
# 1: lm(salary ~ education + experience + city, data = survey_data)
# 2: model.matrix(mt, mf, contrasts)
# 3: model.matrix.default(mt, mf, contrasts)
# 4: `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]])

# Selection: 3

Step 4: The Interrogation

You are now inside the package’s environment. You can check the variables the package author was using.

# Browse[1]> str(data)
# 'data.frame':   5 obs. of  4 variables:
#  $ salary    : num  65000 72000 58000 81000 90000
#  $ education : Factor w/ 3 levels "Bachelors","Masters",..: 1 2 1 3 2
#   ..- attr(*, "contrasts")= chr "contr.treatment"
#  $ experience: num  3 5 1 8 10
#  $ city      : Factor w/ 1 level "New York": 1 1 1 1 1

# Browse[1]> Q
# > options(error = NULL)

3: Help Me Help You with `reprex()`

Code debugging

Creating a Reproducible Example (`reprex`)

The Philosophy: A reprex is a minimal, self-contained code example of your problem.

library(reprex)

my_numbers <- c(10, 20, 30)
mean(my_numbers, na.rm = TRUE)
my_numbers * "two"

reprex()
# output ready for GitHub or StackOverflow

reprex

Be .. Zen !

When creating a reprex, you’re forced to:

Remove the noise - What’s truly essential?
Simplify your data - Use 5 rows instead of 5 million
Recreate the error - Now it’s visible and clear
Share confidently - It’s minimal and reproducible

dput()

Easy to create a reprex with your actual data structure using dput()

dput is very accurate because it captures the exact structure of your data, including factors, levels, and classes.

data(penguins)
penguins |>
  head(2) |>
  dput()
# structure(list(species = structure(c(1L, 1L), levels = c("Adelie", 
# "Chinstrap", "Gentoo"), class = "factor"), island = structure(c(3L, 
# 3L), levels = c("Biscoe", "Dream", "Torgersen"), class = "factor"), 
#     bill_length_mm = c(39.1, 39.5), bill_depth_mm = c(18.7, 17.4
#     ), flipper_length_mm = c(181L, 186L), body_mass_g = c(3750L, 
#     3800L), sex = structure(2:1, levels = c("female", "male"), class = "factor"), 
#     year = c(2007L, 2007L)), row.names = c(NA, -2L), class = c("tbl_df", 
# "tbl", "data.frame"))

The New Kid in Town: `datapasta`

Why Datapasta is Amazing

Copy-paste magic: Grab data from anywhere, paste as R code
Instant reproducibility: No more manual data entry
Multiple formats: Works with spreadsheets, tables, and text

datapasta

Pasting from the Web

X	Location	Min	Max
Partly cloudy.	Brisbane	19	29
Partly cloudy.	Brisbane Airport	18	27
Possible shower.	Beaudesert	15	30
Partly cloudy.	Redcliffe	19	27

Copy this table from the web and paste with datapasta

# After copying the table, run:
datapasta::tribble_paste()
# tibble::tribble( ~X,          ~Location, ~Min, ~Max,
#     "Partly cloudy.",         "Brisbane",  19L,  29L,
#     "Partly cloudy.", "Brisbane Airport",  18L,  27L,
#   "Possible shower.",       "Beaudesert",  15L,  30L,
#     "Partly cloudy.",        "Redcliffe",  19L,  27L
#   )

The AI/LLM Era: Why This Matters More Than Ever

Why Reprex

LLMs need actual data, not descriptions
They analyze structure and values to find bugs
Problems become solvable much faster

The difference:

❌ Vague: “My data won’t merge. I have two datasets.”

✅ Precise: [Actual data structure from data + reproducible code]

For GitHub & Community: - Maintainers can run your exact code immediately - Issues get resolved faster

Case Study

The Problem: You want a correlation matrix, but the cor() function crashes instantly.
The Challenge: The dataset is 54,000 rows long. You can’t paste that into a forum.

The Setup

library(ggplot2)

# We load the massive diamonds dataset
dim(diamonds)
# [1] 53940    10

cor(diamonds)
# > Error in cor(diamonds) : 'x' must be numeric

Step 1: Recreating the data

Instead of inventing fake data, we ask R to extract the exact structural DNA of just the first 3 rows.

data(diamonds)
diamonds |>
  head(3) |>
  dput()
# structure(list(carat = c(0.23, 0.21, 0.23), cut = structure(c(5L, 
# 4L, 2L), class = c("ordered", "factor"), levels = c("Fair", "Good", 
# "Very Good", "Premium", "Ideal")), color = structure(c(2L, 2L, 
# 2L), class = c("ordered", "factor"), levels = c("D", "E", "F", 
# "G", "H", "I", "J")), clarity = structure(c(2L, 3L, 5L), class = c("ordered", 
# "factor"), levels = c("I1", "SI2", "SI1", "VS2", "VS1", "VVS2", 
# "VVS1", "IF")), depth = c(61.5, 59.8, 56.9), table = c(55, 61, 
# 65), price = c(326L, 326L, 327L), x = c(3.95, 3.89, 4.05), y = c(3.98, 
# 3.84, 4.07), z = c(2.43, 2.31, 2.31)), row.names = c(NA, -3L), class = c("tbl_df", 
# "tbl", "data.frame"))

Step 2: Bundle it with `reprex()`

Now, we copy the output from dput() and combine it with our broken code. We highlight it, copy it to our clipboard, and type reprex::reprex() in the console.

# Data generated perfectly by dput()
tiny_diamonds <- structure(list(carat = c(0.23, 0.21, 0.23), cut = ordered(c(5L, 4L, 2L), 
  levels = c("Fair", "Good", "Very Good", "Premium", "Ideal")), color = ordered(c(2L, 2L, 2L), 
  levels = c("D", "E", "F", "G", "H", "I", "J")), clarity = ordered(c(2L, 3L, 5L), 
  levels = c("I1", "SI2", "SI1", "VS2", "VS1", "VVS2", "VVS1", "IF")), depth = c(61.5, 59.8, 56.9), 
  table = c(55, 61, 65), price = c(326L, 326L, 327L), x = c(3.95, 3.89, 4.05), 
  y = c(3.98, 3.84, 4.07), z = c(2.43, 2.31, 2.31)), row.names = c(NA, -3L), 
  class = c("tbl_df", "tbl", "data.frame"))

cor(tiny_diamonds)

#first copy this above and the run reprex::reprex() and then sessionInfo()

Note: Also never forget to give you session info in your reprex, so that people know which version of R and packages you are using. You can do this with sessionInfo().

Note on Anonymization

We can use FakeDataR to generate synthetic data that mimics the structure of our original dataset.

library(FakeDataR)

# generate fake data maintaining exact column structures and factors
safe_diamonds <- generate_fake_data(
  data = diamonds,
  n = 3,
  seed = 123,
  category_mode = "generic",
  numeric_mode = "range",
  column_mode = "keep",
  sensitive_detect = TRUE,
  normalize = TRUE
)

dput(safe_diamonds)

The Modern Debugging Workflow

Encounter problem
Use your debugging tools
Create reprex (with data)
Share to LLM or GitHub

You’re not just helping others, you’re helping yourself.

When I had to create an issue on the ggplot2 repository

Your Complete Debugging Arsenal

The Complete Dev’s Toolkit

debug() / debugonce() – Step into functions from the start
browser() – Freeze time mid-execution and inspect
traceback() – Find where the chain of calls broke
reprex() – Create minimal, shareable examples (with data!)

Remember:

Don’t panic at red text.. it’s a clue!
Every tool serves a different purpose in your arsenal
Combine tools for maximum debugging power
In the AI era, a good reprex is your superpower

Resources

Q&A

Thank You for Your Attention!

`We Data` Acknowledgements:

Fabrice Hategekimana
Vestin Hategekimana

Contact Information

david.munoztord@mailbox.org
Slides: munoztd0/help_me_help_you_R_Debug
GitHub: munoztd0

Help me help you!

Introduction

Debugging is a Skill, Not Magic

WebR Shell

1. Direct Intervention: browser()

Freezing Time

Browser Commands Reference

Live Demo: Browsing the Code

Set Breakpoints

Browsing in a nested function

2. Finding the Clues with traceback()

The Paper Trail of Errors

Live Demo: The Traceback

A note on RStudio “On Error”

Halt on error: 3 modes: None, Traceback, Debug

On VSCode and Positron

Catching Errors Early: debug() and debugonce()

debug() - Persistent Debugging

debugonce() - One-Time Debugging

Advanced: Conditional Debugging

Case Study: The Black Box

The Setup Scene

Step 1: The traceback()

Step 2: options(error = recover)

Step 3: Entering the Matrix

Step 4: The Interrogation

3: Help Me Help You with reprex()

Creating a Reproducible Example (reprex)

Be .. Zen !

dput()

The New Kid in Town: datapasta

Why Datapasta is Amazing

Pasting from the Web

The AI/LLM Era: Why This Matters More Than Ever

Why Reprex

Case Study

The Setup

Step 1: Recreating the data

Step 2: Bundle it with reprex()

Note on Anonymization

The Modern Debugging Workflow

Your Complete Debugging Arsenal

The Complete Dev’s Toolkit

Remember:

Resources

Q&A

Thank You for Your Attention!

We Data Acknowledgements:

Contact Information

1. Direct Intervention: `browser()`

2. Finding the Clues with `traceback()`

Catching Errors Early: `debug()` and `debugonce()`

`debug()` - Persistent Debugging

`debugonce()` - One-Time Debugging

Step 1: The `traceback()`

Step 2: `options(error = recover)`

3: Help Me Help You with `reprex()`

Creating a Reproducible Example (`reprex`)

The New Kid in Town: `datapasta`

Step 2: Bundle it with `reprex()`

`We Data` Acknowledgements: