Purpose

This document describes how to use Python in an R Markdown document.

This tutorial relies upon and extends this site: https://rstudio.github.io/reticulate/articles/r_markdown.html

Prerequisites

Install Python

Follow the instructions at https://psu-psychology.github.io/psy-525-reproducible-research-2020/how_to/install-python.html to install Python on your local machine.

Confirm that your R Studio session can see Python3

The following works on Mac OS. I need to add a more generic test that works on Windows, too.

path_2_python3 <- system("which python3")
if (path_2_python3 == "") message("Python3 not found.")

You may find this site helpful if you run into trouble.

An alternative

It appears that the R reticulate package will ask if you want to install the mini-conda package manager if it does not find a Python distribution. It’s fine to use this if it suits your needs. I have not tested it yet.

Install the R reticulate package

Enter install.packages("reticulate") from your R console.

Install Python packages

Like R, bare bones Python is useful but limited. You’ll want to install packages to extend its functionality.

Mac OS

Open a terminal, and enter the following commands:

pip3 install pandas
pip3 install numpy
pip3 install matplotlib

These three packages give you much-needed additional functionality.

Note:pip3 is the Python 3 version of pip. Depending on your specific set-up, it’s possible that you could use pip install here, but the pip3 install command ensures that you install the right versions of the packages.

If you do not have admin privileges, you may need to use these commands to install the packages with the --user flag to limit the installation to your own account.

pip3 install pandas --user
pip3 install numpy --user
pip3 install matplotlib --user

Windows

Installation on Windows is essentially the same. Open the Command Prompt or PowerShell application, then enter these commands:

pip3 install pandas
pip3 install numpy
pip3 install matplotlib

or as needed

pip3 install pandas --user
pip3 install numpy --user
pip3 install matplotlib --user

Test set-up

Load the reticulate R package and specify path to the Python version we want to use. Then check to make sure that reticulate can talk to Python 3. (Note that this next chunk is an R chunk).

library(reticulate)
if (reticulate::py_available()) message("Python 3 found.")

Install a few packages

Let’s check to see if pandas is installed. Pandas is the Python world’s equivalent of tidyverse although in saying so I’m sure I’m offending someone.

if (reticulate::py_module_available("pandas")) message("'pandas' found.")
## 'pandas' found.

We’ll also check to see if matplotlib is installed. This is a core plotting library.

if (reticulate::py_module_available("matplotlib")) message("'matplotlib' found.")

If these are not installed, then I suggest you install the packages outside of R, as described above.

Working example

We import the pandas package and the csv/zoo.csv dataset as a pandas data frame called critters.

import pandas
critters = pandas.read_csv("csv/zoo.csv")

Then, we print the head of the dataset.

# default is 5
critters.head(n=3)
##      animal  uniq_id  water_need
## 0  elephant     1001         500
## 1  elephant     1002         600
## 2  elephant     1003         550

Note the syntax. When we created critters, it created a pandas data frame. Pandas data frames allow a number of methods (functions) to be applied to them, head() is one of them. So, to apply the head() function to critters, we put method/function call at the end.

Here’s the full critters data set.

print(critters)
##       animal  uniq_id  water_need
## 0   elephant     1001         500
## 1   elephant     1002         600
## 2   elephant     1003         550
## 3      tiger     1004         300
## 4      tiger     1005         320
## 5      tiger     1006         330
## 6      tiger     1007         290
## 7      tiger     1008         310
## 8      zebra     1009         200
## 9      zebra     1010         220
## 10     zebra     1011         240
## 11     zebra     1012         230
## 12     zebra     1013         220
## 13     zebra     1014         100
## 14     zebra     1015          80
## 15      lion     1016         420
## 16      lion     1017         600
## 17      lion     1018         500
## 18      lion     1019         390
## 19  kangaroo     1020         410
## 20  kangaroo     1021         430
## 21  kangaroo     1022         410
pandas.crosstab(critters['animal'], columns = ['animal'])
## col_0     animal
## animal          
## elephant       3
## kangaroo       3
## lion           4
## tiger          5
## zebra          7