The LASER Toolkit & Reproducible Research

LASER Orientation Module: Conceptual Overview

Overview

Reproducible Research

  • Definition

  • Benefits

  • Pain Points

  • Best Practices

  • Discussion

The LASER Toolkit

  • GitHub

  • Posit Cloud

  • RStudio

  • Quarto

  • R & Python

What is Reproducible Research?

In computational sciences like LA:

  • Ideally… researchers can replicate your findings by following the procedures used to gather the data and run the computer code.

  • Realistically… the data and code used to make a finding are available and sufficient to recreate the finding.

Reproducible Research with R (Gandrud 2021)

The Benefits of Reproducible Research

For Science:

  • Standard to judge scientific claims

  • Enhances replicability

  • Avoiding effort duplication

  • Cumulative knowledge development

For Yourself:

  • Better work habits

  • Better teamwork

  • Changes are easier

  • Higher impact research

Reproducible Research?

Discussion

Think about the following questions and then discuss at your table:

  • What have your experiences been with reproducible research?

  • What tools have you used to ensure the reproducibility of your work?

  • What questions do you have about reproducible research?

Best Practices for Reproducible Research

  1. Document everything!

  2. Everything is a (text) file.

  3. All files should be “human readable.”

  4. Explicitly tie your files together.

  5. Have a plan to organize, store, and make your files available.

To learn more: Reproducible research with R and R studio (3rd Edition). CRC Press.

The LASER Toolkit

GitHub, Posit, RStudio, Quarto, and R & Python

Tool Types

Reproducible research involves two broad sets of tools:

  • A Reproducible Research Environment, that includes the statistical tools you need to run your analyses; automatically track the provenance of data, analyses, and results; and to package them for redistribution”.

  • A Reproducible Research Publisher that prepares dynamic documents for presenting results and is easily linked to the reproducible research environment.

GitHub

The LASER Institute GitHub site houses repositories for all curriculum materials and GitHub Pages is used to publish our site to the web.


GitHub is a web-based platform used for version control, collaboration, and sharing of a project’s code, documents, and other related files.

            go.ncsu.edu/laser-github

Posit Cloud

Posit Cloud lets you access Posit’s powerful set of data science tools like RStudio IDE and Jupyter Notebooks right in your browser.


Interactive components for each module (e.g., Case Studies and Code-Alongs) are accessed and complete in our LASER Learners workspace.

            go.ncsu.edu/laser-learners

RStudio

RStudio is an integrated development environment (IDE) for R and Python and includes:

  • a Console for running R code directly,

  • syntax-highlighting editor that supports direct code execution in the Source pane,

  • tools for plotting, history, debugging, and management of research projects in the Environment and Files panes.

Quarto

Quarto is used with Python and R to create reproducible, production quality:

R & Python

The LASER Institute uses R and Python, two of the most popular programming languages for data science, statistical analysis, and machine learning.


Both are freely available have large and active communities and a vast number of libraries and frameworks for learning analytics.

Essential Readings

Chapters 1 and 2 of Reproducible Research with R and RStudio by Gandrud (2021) lay the groundwork for understanding and implementing reproducible research:

  • Chapter 1: Introducing Reproducible Research

  • Chapter 2: Getting Started with Reproducible Research

Questions for reflection and discussion are also included in our Essential Readings document. Responses can be posted to our laser-orientation channelon Slack.

Acknowledgements

This work was supported by the National Science Foundation grants DRL-2025090 and DRL-2321128 (ECR:BCSER). Any opinions, findings, and conclusions expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

References