## Setup

Load/install required packages the long way. We will revisit this example in
the exercises at the end to redo it with our new functional programming skills.
If you cannot get this document to work, you can view it interactively at this
[link](http://bit.ly/functionalprogramming).

```{r setup, warning = FALSE, message = FALSE, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE)

if (!require(pacman)) { install.packages("pacman"); library(pacman) }
p_load(learnr, tidyverse, purrr, here)

```

```{r setup2, eval = FALSE}

if(!require("learnr")){install.packages("learnr"); library("learnr")}

if(!require("tidyverse")){install.packages("tidyverse"); library("tidyverse")}

if(!require("purrr")){install.packages("purrr"); library("purrr")}

if(!require("here")){install.packages("here"); library("here")}

#setwd(here::here('functional_programming'))

```

## Introduction to Functions

- Functions in R take some input argument (a 'formal') and execute
some code based off of the input ('body')

- Functions are useful for creating 'shortcuts' that you use often and
are not already implemented in R or an R package

- In order to use a function you create, you must first 'define' it, much
like you would by assigning a value to an object in R

- To assign a function to an object (name), use the `function(input){body}` call:

```{r functions1}

hello_world <- function() { print('Hello, world!') }

```

- Here, `hello_world` is the name of the function, and `print()` is
what the function does (everything inside the `{}`)
    
- Notice that this function does not have any input!

*First, reproduce the function above, and run it.*
*Next, try modifying the function's input and body to allow it to print your name instead of 'world'.*

```{r functions-ex-1, exercise=TRUE, exercise.lines = 10}


```

```{r functions-ex-1-solution}

hello_world <- function(name){
  message <- paste0('Hello, ',name,'!')
  return(message)
}

hello_world('Dan')

```

## Functions (cont.)

That's neat, but useless. Let's walk through another example with some real life
statistical significance.

```{r functions2}

set.seed(1999)

z_score <- function(score, values){
  grand_mean <- mean(values, na.rm = TRUE)
  sdt <- sd(values, na.rm = TRUE)
  z <- (score - grand_mean) / sdt
  return(z)
}

values <- runif(20, min = 0, max = 20)
score <- values[1]

z_score(score, values)

```

- The last line of the function (`return(z)`) is the function output

- In this example, we want what we stored in the object `z` to be given back

- For multiple returns, we need to store our final output in a list or dataframe

```{r func-exc, exercise=TRUE, exercise.lines = 20}

set.seed(1999)

z_score <- function(score, values){
  grand_mean <- mean(values, na.rm = TRUE)
  sdt <- sd(values, na.rm = TRUE)
  z <- (score - grand_mean) / sdt
  return(z)
}

values <- runif(20, min = 0, max = 20)
score <- values[1]

z_score(score, values)

```

- Again, this is nice, but as it stands now you would have to run `z_score()` on each
*individual* score to get the z score value

- We can unlock the full potential of R functions by combining it with
loops, conditional logic, and other functionals (e.g., `lapply()`)

## Loops

- Basic R loops are similar to other programming languages (e.g., python, MatLab, etc.)

- Tells R to evaluate something until a certain point is reached (a 'while' loop),
or until the end of a vector is reached (a 'for' loop)

- In other words, loops cycle through 'decision-trees' until a specified break-point is
reached (see flowchart below)

![Loop Flowchart](https://i2.wp.com/blog.datacamp.com/wp-content/uploads/2015/07/flowchart1.png?w=450?tap_a=5644-dce66f&tap_s=10907-287229)

## For Loops

- All loops iterate along some sequence

- Iterators are (generally) numbers, and are initialized by calling:
`for (i in 1:10) {loop something here}`

- `i` is the iterator, and tells the `for` statement to cycle through the
loop 10 times (`1:10`):

`r 1:10`

```{r for-loops-1}

for (i in 1:10){
  ## Loops cycle through what is inside the {} brackets
  print(i)
}

```

*Try to use what you know to loop through numbers 1 - 10 and square each value*

Write the R code required to print each number's squared value:

```{r for-loops-1-ex, exercise=TRUE, exercise.lines = 10}


```

```{r for-loops-1-ex-solution}

for(i in 1:10){
  print(i^2)
}

```

So when might this come in handy? Consider a list of values that you want to run
the same function on; for example, we might want to rename the subject column of
a dataframe to extract only the numeric values:

```{r for-loops-2}

list <- c('subject_1','subject_2','subject_3','subject_4')

# Instead of manually counting how many iterations we need
# we can instead look at the `length()` of the list we
# just created
length(list)

# To extract just the numeric values, we use the
# tidyverse function `readr::parse_number()`
readr::parse_number('subject_45')

# Initialize an empty vector to fill
list_sans_text <- vector('numeric')

# Let's try combining each of these into a for loop:
for (i in 1:length(list)){
  list_sans_text <- readr::parse_number(i)
}

print(list_sans_text)

```

Hmm...that didn't work. Why?

- Iterators are (generally) numbers, and therefore correspond to a *position* in
a vector or list, not the value itself

*How would we tell the loop to use the value instead of the position?*
*Try to fix the code below so that the output is a list of only subject numbers.*

```{r for-loops-2-ex, exercise=TRUE, exercise.lines = 10}

list <- c('subject_1','subject_2','subject_3','subject_4')

# Initialize an empty vector to fill
list_sans_text <- vector('numeric')

for (i in 1:length(list)){
  list_sans_text <- readr::parse_number(i)
}

print(list_sans_text)

```

```{r for-loops-2-ex-solution}

list <- c('subject_1','subject_2','subject_3','subject_4')

# Initialize an empty vector to fill (`double()` here is
# synonomous with `vector('numeric')`)
list_sans_text <- double()

for (i in 1:length(list)){
  list_sans_text[i] <- readr::parse_number(list[i])
}

print(list_sans_text)

```

- Remember, iterators in `loops` correspond to a position within! If you want
to use an actual value (character or numeric) you need to extract it first!

## While Loops

- While loops are generally the same as for loops, but instead of iterating along a 
sequence of values, they repeat until a condition is met

```{r while-loop-example, exercise = TRUE, exercise.lines = 10}

x <- 1

while( x <= 10){
  y <- x
  print(y^2)
  x <- x+1
}

```

- Can you change the code above so that it uses the previous value as the power?

```{r while-loop-example-solution}

x <- 1

while( x <= 10){
  y <- x
  p <- x-1
  print(y^p)
  x <- x+1
}

```

## A Hefty Loop Example

- The above are simple examples. Consider a more hefty example whereby you
want to run quickly z score variables across multiple columns:

```{r hefty-loop}

data(starwars)
starwars <- as.data.frame(starwars)
head(starwars)

x_values <- c('height','mass')

for (i in 1:length(x_values)){
  
  new <- paste0(x_values[i],'_centered')
  values <- as.numeric(starwars[,x_values[i]])
  mean <- mean(values, na.rm=TRUE)
  std <- sd(values, na.rm=TRUE)
  
  for(j in 1:nrow(starwars)) {
    # Here is our z-score function within the loop
    starwars[j,new] <- (starwars[j,x_values[i]] - mean) / std
    }
}

# Check to make sure we have values by printing first 6 rows
head(as_tibble(starwars))

```

Was that more complicated then it needed to be? Most certainly. That's why
most people leave loops behind in favor of the `apply` family. But before we
do, note two important concepts detailed in the example above:

1) You can nest loops within loops (theoretically infinitely)

2) The easy way of extracting an element from a dataframe or list (`df$column`) does
not play well within loops

## Conditional Logic

Conditional logic in programming evaluates a statement as either `TRUE` or `FALSE` and
performs code based on the statement. In this way, it is much like loops.

- The basic conditional logic functions are `if()` and `else()`, which are often used
together

- `if()` evaluates a statement:

```{r conditional-logic1}

x <- 1

# 'If x equals 1, print to the console 'This is TRUE''
if (x == 1) {'This is TRUE'}

```

- Notice that we use `==` instead of `=`

- When the statement is not `TRUE`, `else()` can be used as a follow up:

```{r conditional-logic2}

x <- 2

# 'If x equals 1, print 'This is TRUE' to the console,
# if x does not equal 1, print 'This is FALSE' to the console'
if (x == 1) {'This is TRUE'} else {'This is FALSE'}

```

- We can combine these two statements in a vectorized base function, `ifelse()`:

```{r conditional-logic3}

x <- 2

ifelse(x == 1, 'This is TRUE!', 'This is FALSE!')

```

- `ifelse()` has the benefit of being vectorized, which means that it operates on
a case-by-case level instead of the vector as a whole

```{r conditional-logic4}

set.seed(1999)
x = runif(10)

# This doesn't replace each value
y <- if (x < .5) {TRUE} else {FALSE}

# Instead, it evaluates the first element of the vector 'x'
x[1]; y

```

```{r conditional-logic4-1}

# ifelse() will work on each element

y2 <- ifelse(x < .5, TRUE, FALSE)

y2

```

*How might we use conditional logic with tidyverse's `mutate()` function to change `starwars$gender` to capital letters?*

```{r for-conditional-1-ex, exercise=TRUE, exercise.lines = 10}

data(starwars)


```

```{r for-conditional-1-ex-solution}

data(starwars)

starwars <- starwars %>% 
  mutate(., gender2 = ifelse(.$gender == 'male', 'Male',
                             ifelse('female','Female',.$gender)))

```

## Apply Family (with a Tidyverse flavor)

- The apply family--as the name suggests--applies a function across certain elements

- Think of this family as a simplified and optimized loops

- The apply family consists largely of:
  
  1) `lapply()` - the simplest apply; it takes a function and applies it to each
  element of a list
  
  2) `sapply()` and `vapply` - which returns simplified vectors instead of lists
  
  3) `purrr::map()` - which can handle multiple inputs
  

- Let's apply the `z_score()` function to multiple cases

```{r apply1}

data(starwars)
head(starwars)

```

```{r apply1-1}

myList <- c('height','mass')

# Returns list
myValues <- lapply(starwars[myList], scale)
head(myValues$height); head(myValues$mass)

# Returns vector
myValues2 <- sapply(starwars[,c('height','mass')], scale)
head(myValues2)

```

- `purrr::map()` is largely the same as `lapply()`, but returns consistent values, and has the
ability to be scaled to multiple input values using `purrr::map2()` and `purrr::pmap()`

```{r map1}

set.seed(1999)
df <- data_frame(a = runif(20), b = runif(20), c = runif(20), d = runif(20))

# The default for runif() should be a mean of 0.5
# map_dbl() returns 'double' (a numeric value with decimal points) not a list
df %>% map_dbl(., function(x) mean(x))


# In this example, it is functionally equivelent to:
df %>% sapply(., mean)

```

**See what happens if you use `map()` instead of `map_dbl()`**

```{r map1-ex, exercise=TRUE, exercise.lines = 10}

df <- data_frame(a = runif(20), b = runif(20), c = runif(20), d = runif(20))

df %>% map_dbl(., function(x) mean(x))

```

- `purrr::map()`'s utility becomes more apparent when we consider a fairly common issue in data science: split-apply-combine, or map-reduce

```{r map2}

data(starwars)
# Omiting incomplete cases so it plays nicely with `lm()`
starwars <- as.data.frame(starwars) %>% na.omit(.)

# Model the relationship between height and mass for each gender
models <- starwars %>% 
  split(., .$gender) %>% # split
  map(~lm(height~mass, data = .)) %>% # apply
  map(summary) # combine

models

```


- Note the use of `~` in the example above; this is short hand for an *annoynmous function*
which can be written out in long form as `function(x) lm(height~mass, data = x)`

## Intermediate Exercises

**1)** Write a function that will automatically and "tidily" give the mean, SD, and
median of `starwars$height` in a dataframe.

```{r intermediate-ex-1, exercise=TRUE, exercise.lines = 10}

data(starwars)
starwars <- as.data.frame(starwars)

summary_stats <- function(){
  
}

summary_stats(starwars$height)

```

```{r intermediate-ex-1-solution}

data(starwars)
starwars <- as.data.frame(starwars)

## Here's one possible approach...

summary_stats <- function(x){
  name <- as.character(x)
  mean <- mean(x, na.rm = TRUE)
  sd <- sqrt(var(x, na.rm = TRUE))
  median <- median(x, na.rm = TRUE)
  
  data.frame(mean=mean,sd=sd,median=median)
}

summary_stats(starwars$height)

```

**2)** Notice how we had to type the same thing four times to check and load four different packages
at the beginning of this tutorial. This violates a fundamental rule of programming (never
copy and paste the same thing). Try to check and load each required package using `lapply()`.
Remember, `lapply()` takes a list as input, and returns a list as output.
*Hint* for `lapply()` to work in this case, you must also pass the input
argument `character.only=TRUE`.

```{r intermediate-ex-2, exercise=TRUE, exercise.lines = 10}

paks <- c('learnr','tidyverse','purrr','here')


```

```{r intermediate-ex-2-solution}

paks <- c('learnr','tidyverse','purrr','here')

if (any(lapply(paks, require, character.only=TRUE) == FALSE)) {
  lapply(paks, install.packages(paks))
}

```

**3)** Use conditional logic to add a column to the `starwars`
dataset splitting it by median `starwars$height`.

```{r intermediate-ex-3, exercise=TRUE, exercise.lines = 10}

data(starwars)


```

```{r intermediate-ex-3-solution}

data(starwars)

medianHeight <- median(starwars$height, na.rm = TRUE)

starwars$median_split <- ifelse(starwars$height >= medianHeight, 'Above','Below')

```

**4)** Return to the `starwars` data set. Use a functional
(`apply()` or `purrr::map()`) to iterate across every column (not just one) and
find out how many `NA`s there are in each. Then, can you replace each `NA` value with
`999`? *Hint* use the base function `replace()` inside your functional.

```{r intermediate-ex-4, exercise=TRUE, exercise.lines = 10}

data(starwars)


```

```{r intermediate-ex-4-solution}

data(starwars)

# How many NAs do we have in our data?
starwarsNAs <- starwars %>%
  purrr::map(., function(x) sum(is.na(x)))

# Let's replace these NAs
starwars_NA_replace <- starwars %>%
  purrr::map(function(x) replace(x, is.na(x), 999))

## Hmm...creates a list, but we want a dataframe!
starwars_NA_replace <- as_tibble(starwars_NA_replace)

```