This document describes how to use Python in an R Markdown document.
This tutorial relies upon and extends this site: https://rstudio.github.io/reticulate/articles/r_markdown.html
Follow the instructions at https://psu-psychology.github.io/psy-525-reproducible-research-2020/how_to/install-python.html to install Python on your local machine.
The following works on Mac OS. I need to add a more generic test that works on Windows, too.
path_2_python3 <- system("which python3")
if (path_2_python3 == "") message("Python3 not found.")
You may find this site helpful if you run into trouble.
It appears that the R reticulate
package will ask if you want to install the mini-conda
package manager if it does not find a Python distribution. It’s fine to use this if it suits your needs. I have not tested it yet.
reticulate
packageEnter install.packages("reticulate")
from your R console.
Like R, bare bones Python is useful but limited. You’ll want to install packages to extend its functionality.
Open a terminal, and enter the following commands:
pip3 install pandas
pip3 install numpy
pip3 install matplotlib
These three packages give you much-needed additional functionality.
Note:pip3
is the Python 3 version of pip
. Depending on your specific set-up, it’s possible that you could use pip install
here, but the pip3 install
command ensures that you install the right versions of the packages.
If you do not have admin privileges, you may need to use these commands to install the packages with the --user
flag to limit the installation to your own account.
pip3 install pandas --user
pip3 install numpy --user
pip3 install matplotlib --user
Installation on Windows is essentially the same. Open the Command Prompt or PowerShell application, then enter these commands:
pip3 install pandas
pip3 install numpy
pip3 install matplotlib
or as needed
pip3 install pandas --user
pip3 install numpy --user
pip3 install matplotlib --user
Load the reticulate
R package and specify path to the Python version we want to use. Then check to make sure that reticulate
can talk to Python 3. (Note that this next chunk is an R chunk).
library(reticulate)
if (reticulate::py_available()) message("Python 3 found.")
Let’s check to see if pandas
is installed. Pandas is the Python world’s equivalent of tidyverse
although in saying so I’m sure I’m offending someone.
if (reticulate::py_module_available("pandas")) message("'pandas' found.")
## 'pandas' found.
We’ll also check to see if matplotlib
is installed. This is a core plotting library.
if (reticulate::py_module_available("matplotlib")) message("'matplotlib' found.")
If these are not installed, then I suggest you install the packages outside of R, as described above.
We import the pandas
package and the csv/zoo.csv
dataset as a pandas data frame called critters
.
import pandas
critters = pandas.read_csv("csv/zoo.csv")
Then, we print the head of the dataset.
# default is 5
critters.head(n=3)
## animal uniq_id water_need
## 0 elephant 1001 500
## 1 elephant 1002 600
## 2 elephant 1003 550
Note the syntax. When we created critters
, it created a pandas data frame. Pandas data frames allow a number of methods (functions) to be applied to them, head()
is one of them. So, to apply the head()
function to critters
, we put method/function call at the end.
Here’s the full critters
data set.
print(critters)
## animal uniq_id water_need
## 0 elephant 1001 500
## 1 elephant 1002 600
## 2 elephant 1003 550
## 3 tiger 1004 300
## 4 tiger 1005 320
## 5 tiger 1006 330
## 6 tiger 1007 290
## 7 tiger 1008 310
## 8 zebra 1009 200
## 9 zebra 1010 220
## 10 zebra 1011 240
## 11 zebra 1012 230
## 12 zebra 1013 220
## 13 zebra 1014 100
## 14 zebra 1015 80
## 15 lion 1016 420
## 16 lion 1017 600
## 17 lion 1018 500
## 18 lion 1019 390
## 19 kangaroo 1020 410
## 20 kangaroo 1021 430
## 21 kangaroo 1022 410
pandas.crosstab(critters['animal'], columns = ['animal'])
## col_0 animal
## animal
## elephant 3
## kangaroo 3
## lion 4
## tiger 5
## zebra 7