If you don’t speak LOTR, just ignore that.
R Markdown extends Markdown , a scripting ‘language’ used in lots of blogging engines and wikis. Markdown is a simple formatting syntax for authoring HTML documents and can also be converted to PDF and MS Word documents. It’s designed to be easy for humans to write and for machines to read.
You can learn the basics of R Markdown in a very short time. Here are several resources for doing so we recommend:
- RStudio has an R Markdown page with extensive documentation, including some very useful cheatsheets
- Hadley Wickham, the author of
ggplot2
and other R packages has an online tutorial
- Psychologists Mike Frank and Chris Hartgerink have produced a nice tutorial that they gave at the 2017 Society for Improving Psychological Science (SIPS) meeting.
The anatomy of an R Markdown file
Let’s create a simple R Markdown file so we can see how this works. From the ‘File’ menu select ‘New File…’ and then the R Markdown file type.
Notice that the ‘New R Markdown’ window let’s us choose different types of documents, presentations, an interactive web application using Shiny, or another file from some template. We’ll just use the defaults and create a new Untitled document that gives as its default output an HTML file.
Let’s expand the Source panel so we can see the full file.
The template shows us the core components of an R Markdown file:
Body text
The body text starts with the double hash marks ##
. R Markdown follows Markdown’s convention of using hash marks to specify heading levels as in an outline. One hashmark means the 1st or top level. Two hashmarks means the 2nd level, etc.
Note that we can include clickable web links by surrounding URLs with angular brackets <>
, make text boldface by surrounding it with double-asterisks **boldface**
or in italics with single asterisks *italics*
.
R Markdown allows other kinds of content to be inserted in body text:
- Named links
- Images:
- Equations: \(e = mc^2\)
and even video or audio recordings using HTML.
Code chunks
Code chunks are separated from the body text by triple back-ticks ‘```’.
Let’s look at the second code chunk:
Text in brackets {r cars}
tells R that this chunk contains code written in R and gives the chunk the name cars
. The name is optional, and must be unique within a file, but it can help in debugging a long R Markdown file. In this case, the chunk runs the summary()
command on the cars
dataset.
When you create your own R Markdown documents, you will put your R code inside a chunk. You can create new chunks by clicking on a blank line in the R Markdown document and typing CTRL+ALT+I
.
The virtue of putting code in chunks is that you can run them piece by piece from within the document. For example, clicking on the small right arrow icon runs the current chunk.
Scrolling down to the next code chunk, we see that it plots data from the pressure
dataset: plot(pressure)
.
Returning to the first chunk called setup
, we see that chunks themselves can have options that specify whether or not they are displayed echo=FALSE
in the document, whether or not chunks are evaluated eval=TRUE
and so forth. This allows the user to customize how the document executes each chunk. See the RStudio documentation for more information about what chunk options suit your needs.
Rendering output
Edit the body text if you like, then render the document using the Knit
button. If we have not saved the file, we may be prompted for a file name, you can use test.Rmd
for now. This will generate a test.html
file (per the output: html_document
in the header). Let’s open that file. We can see that the document combines the body text, links, R code chunks and R code ouputs, including plots in a very readable way.
One of the virtues of R Markdown, of course, is that we could produce different output formats for the same file, either by changing the output
field in the document header or by issuing a command in the console:
rmarkdown::render("test.Rmd",
output_format = 'pdf_document')
rmarkdown::render("test.Rmd",
output_format = 'word_document')
# More than one output_format
rmarkdown::render("test.Rmd",
output_format = c('html_document',
'pdf_document',
'word_document'))
R Markdown using 2019 R bootcamp data
We can use an R Markdown document bootcamp-survey.Rmd to analyze the survey data. Let’s open it up and see how it looks.
The default format is an html_document
, and I’ve added some additional parameters in the header to produce a table of contents toc: yes
with numbered sections number_section: TRUE
, that create a ‘floating’ menu-like table of contents via toc_float: TRUE
.
output:
html_document:
toc: TRUE
toc_depth: 3
toc_float: TRUE
number_section: TRUE
I’ve also added parameters so I can easily produce outputs in different formats. Notice that I’ve added comments about what I did and why, so that the R Markdown file is like a combination lab notebook and data report. And by creating it in R Markdown, I can satisfy many audiences with different needs.
Your adviser likes PDFs? No problem. Your collaborator prefers MS Word? Got it covered. Need to give a quick brown bag talk you can give from any web browser? Easy. R Markdown can become one of your super-powers.
Why write reproducible papers/reports in R Markdown?
The previous example showed how we might create reproducible data analysis reports in R Markdown. It’s only a short step to writing full papers this way. But let’s talk about why we might want to do this.
The following is section is copied verbatim from Mike Frank & Chris Hartgerink’s tutorial on GitHub.
There are three reasons to write reproducible papers. To be right, to be reproducible, and to be efficient. There are more, but these are convincing to us. In more depth:
To avoid errors. Using an automated method for scraping APA-formatted stats out of PDFs, [@Nuijten2015-ul] found that over 10% of p-values in published papers were inconsistent with the reported details of the statistical test, and 1.6% were what they called “grossly” inconsistent, e.g. difference between the p-value and the test statistic meant that one implied statistical significance and the other did not. Nearly half of all papers had errors in them.
To promote computational reproducibility. Computational reproducibility means that other people can take your data and get the same numbers that are in your paper. Even if you don’t have errors, it can still be very hard to recover the numbers from published papers because of ambiguities in analysis. Creating a document that literally specifies where all the numbers come from in terms of code that operates over the data removes all this ambiguity.
To create spiffy documents that can be revised easily. This is actually a really big neglected one for us. At least one of us used to tweak tables and figures by hand constantly, leading to a major incentive never to rerun analyses because it would mean re-pasting and re-illustratoring all the numbers and figures in a paper. That’s a bad thing! It means you have an incentive to be lazy and to avoid redoing your stuff. And you waste tons of time when you do. In contrast, with a reproducible document, you can just rerun with a tweak to the code. You can even specify what you want the figures and tables to look like before you’re done with all the data collection (e.g., for purposes of preregistraion or a registered report).