Reproducible Documents

What Does ‘Reproducible’ Mean?

Re-run the analysis with the same data and get the same result

Why Would We Want Our Documents to Be Reproducible?

  • Leave a complete paper trail of our work to posterity
  • Enable our future selves to reproduce every single analysis step without relying on our memory

How Can Reproducibility Be Accomplished?

Different files for different tasks (read_data.R, EDA.R, data_wrangling.R, etc.)?

How Can Reproducibility Be Accomplished?

Several scripts with numbered filenames (01_read_data.R, 02_EDA.R, etc.)?

How Can Reproducibility Be Accomplished?

One script with sections for each task?

Reproducible Research

When preparing & writing your scripts keep in mind

  • A researcher in the future (incl. y/self) needs to understand what you did w/o asking
  • Leave a complete ‘paper’ trail of the analysis steps
  • Document and justify every decision you make
  • Refrain from concise code

Writing up

  • Open your fav word processor
  • Write sections Introduction, Methods, Results, Discussion
  • Results: copy/paste results from R output into document
    • tables
    • figures
    • inline number values
  • Discussion …

Wouldn’t it be nice to have R put together our dissertation?

Literate Programming

Don Knuth

1981: WEB

Markdown

  • Markdown is a mark up[sic!] language
    • limited set of mark up instructions
    • easy to learn
    • even ‘source’ code is easy to read
    • ASCII – no specific software needed to read and write

Markdown Example Page

Code

---

title: "Markdown Example Page"
format: html

---


This is an example page to document some of the easy formatting instructions used in Markdown. 

# Headers

## More Headers

### Even More Headers

# Line-breaks

A line-break 
is introduced by two(!) line-breaks in the source text:

Or by two trailing spaces:  
But beware, the spaces may be missed when reading the source text, which can be confusing to the reader!

# Lists

## Unordered Lists

* An Item 
* Another item
* Yet another one

## Ordered Lists{#ordered-lists}

1. First item
2. Second item

# Fonts etc.

The *different* **fonts** of a typeface ~~were already introduced~~are shown here, but there's more to markup your writing: 

* superscript^2^
* subscript~i~
* Preformatted (typewriter) text:  
  `This is a typewriter face`
* LaTeX equations
  * Inline: $\sum_{n=1}^{k} x_n^2$
  * Display:
  $$\sum_{n=1}^{k} x_n^2$$

# Tables

First Header  | Second Header   
------------- | -------------   
Content Cell  | Content Cell    
Content Cell  | Content Cell    

# Images

![This is us.](https://www.immm.hmtm-hannover.de/fileadmin/_migrated/pics/immm_foto_schiffgraben.jpg)

# Links

In-line references, e.g. [Ordered Lists](#ordered-lists), as well as links to the WWW are easy to write (and read!): Wikipedia page on [literate programming](https://en.wikipedia.org/wiki/Literate_programming).

HTML

knitr

knitr facilitates the integration of R code and R output into a document:

knitr and Markdown Example Page

Code

---

title: "`knitr` and Markdown Example Page"
format: html

---

A simple calculation in R:
```{r}
1 + 1
```
[1] 2
Inline-usage of R variables:
```{r}
Mean <- 4.56
StandardDeviation <- 2.1
```
The average value ($\pm$ SD) amounted to `{r} Mean` (`{r} StandardDeviation`) units.

The library `knitr` contains some nifty functions to, e.g., output data.frames as tables:
```{r}
library(knitr)
kable(ToothGrowth)
```
len supp dose
4.2 VC 0.5
11.5 VC 0.5
7.3 VC 0.5
5.8 VC 0.5
6.4 VC 0.5
10.0 VC 0.5
11.2 VC 0.5
11.2 VC 0.5
5.2 VC 0.5
7.0 VC 0.5
16.5 VC 1.0
16.5 VC 1.0
15.2 VC 1.0
17.3 VC 1.0
22.5 VC 1.0
17.3 VC 1.0
13.6 VC 1.0
14.5 VC 1.0
18.8 VC 1.0
15.5 VC 1.0
23.6 VC 2.0
18.5 VC 2.0
33.9 VC 2.0
25.5 VC 2.0
26.4 VC 2.0
32.5 VC 2.0
26.7 VC 2.0
21.5 VC 2.0
23.3 VC 2.0
29.5 VC 2.0
15.2 OJ 0.5
21.5 OJ 0.5
17.6 OJ 0.5
9.7 OJ 0.5
14.5 OJ 0.5
10.0 OJ 0.5
8.2 OJ 0.5
9.4 OJ 0.5
16.5 OJ 0.5
9.7 OJ 0.5
19.7 OJ 1.0
23.3 OJ 1.0
23.6 OJ 1.0
26.4 OJ 1.0
20.0 OJ 1.0
25.2 OJ 1.0
25.8 OJ 1.0
21.2 OJ 1.0
14.5 OJ 1.0
27.3 OJ 1.0
25.5 OJ 2.0
26.4 OJ 2.0
22.4 OJ 2.0
24.5 OJ 2.0
24.8 OJ 2.0
30.9 OJ 2.0
26.4 OJ 2.0
27.3 OJ 2.0
29.4 OJ 2.0
23.0 OJ 2.0

HTML

Quarto

Quarto

  • is a publishing system
  • renders documents as
    • PDF
    • HTML
    • ODT
    • DOCX
  • adds bells and whistles to Markdown

For instance

  • Visual editor (for the less inclined)
  • Commenting
  • Include (almost) any file
  • Switch between output formats as needed
  • Spell-checking
  • Automatic replacement of straight quotes, -- and ---

Using Quarto et al. for Your Dissertation?

Pros:

  • Inter-weave prose and code
  • Dynamically update tables, figures and in-line numbers
  • All in one small file
  • ‘Source code’ most likely readable for a long time
  • It’s Reproducible Research!
  • Enables you to work collaboratively

Cons:

  • Supervisors: But it’s not Word!
  • You need to change your workflow
  • Requires learning Markdown, knitr and Quarto
  • Collaborators
    • either work on shared Quarto file (git!)
    • or write comments in a docx file (you have to incorporate their suggestions into the Quarto file)