Module 11 Calling functions

Learning goals

  • Understand what functions are, and why they are awesome
  • Understand how functions work
  • Understand how to read function documentation

 

You have already worked with many R functions; commands like getwd(), length(), and unique() are all functions. You know a command is a function because it has parentheses, (), attached at its end.

Just as variables are convenient names used for calling objects such as vectors or dataframes, functions are convenient names for calling processes or actions. An R function is just a batch of code that performs a certain action.

Variables represent data, while functions represent code.

Most functions have three key components: (1) one or more inputs, (2) a process that is applied to those inputs, and (3) an output of the result. When you call a function in R, you are saying, “Hey R, take this information, do something to it, and return the result to me.” You supply the function with the inputs, and the function takes care of the rest.

Take the function mean(), for example. mean() finds the arithmetic mean (i.e., the average) of a set of values.

x <- c(4,6,3,2,6,8,5,3) # create a vector of numbers
mean(x) # find their mean
[1] 4.625

In this command, you are feeding the function mean() with the input x.

Perhaps this analogy will help: When you think of functions, think of vending machines: you give a vending machine two inputs – your money and your snack selection – then it checks to see if your selection of choice is in stock, and if the money you provided is enough to pay for the snack you want. If so, the machine returns one output (the snack).

Base functions in R

There are hundreds of functions already built-in to R. These functions are called “base functions”. Throughout these modules, we have been – and will continue – introducing you to the most commonly used base functions.

You can access other functions through bundles of external code known as packages, which we explain in an upcoming module.

You can also write your own functions (and you will!). We provide an entire module on how to do this.

Note that not all functions require an input. The function getwd(), for example, does not need anything in its parentheses to find and return current your working directory.

Saving function output

You will almost always want to save the result of a function in a new variable. Otherwise the function just prints its result to the Console and R forgets about it.

You can store a function result the same way you store any value:

x <- c(4,6,3,2,6,8,5,3) 
x_mean <- mean(x) 
x_mean
[1] 4.625

Functions with multiple inputs

Note that mean() accepts a second input that is called na.rm. This is short for NA.remove. When this is set to TRUE, R will remove broken or missing values from the vector before calculating the mean.

x <- c(4,6,3,2,NA,8,5,3)  # note the NA
mean(x,na.rm=TRUE)
[1] 4.428571

If you tried to run these commands with na.rm set to FALSE, R would throw an error and give up.

Note that you provided the function mean() with two inputs, x and na.rm, and that you separated each input with a comma. This is how you pass multiple inputs to a function.

Instructor tip:

A silly way to remember the na.rm input is to refer to it as “narm”, as in, “Dont forget to narm, yall”

Function defaults

Note that many functions have default values for their inputs. If you do not specify the input’s value yourself, R will assume you just want to use the default. In the case of mean(), the default value for na.rm is FALSE. This means that the following code would throw an error …

x <- c(4,6,3,2,NA,8,5,3)  # note the NA
mean(x)
[1] NA

Because R will assume you are using the default value for na.rm, which is FALSE, which means you do not want to remove missing values before trying to calculate the mean.

Function documentation (i.e., getting help)

Functions are designed to accept only a certain number of inputs with only certain names. To figure out what a function expects in terms of inputs, and what you can expect in terms of output, you can call up the function’s help page:

When you enter this command, the help documentation for mean() will appear in the bottom right pane of your RStudio window:

Learning how to read this documentation is essential to becoming competent in using R.

Be warned: not all documentation is easy to understand! You will come to really resent poorly written documentation and really appreciate well-written documentation; the few extra minutes taken by the function’s author to write good documentation saves users around the world hours of frustration and confusion.

  • The Title and Description help you understand what this function does.

  • The Usage section shows you how type out the function.

  • The Arguments section lists out each possible argument (which in R lingo is another word for input or parameter), explains what that input is asking for, and details any formatting requirements.

  • The Value section describes what the function returns as output.

  • At the bottom of the help page, example code is provided to show you how the function works. You can copy and paste this code into your own script of Console and check out the results.

Note that more complex functions may also include a Details section in their documentation, which gives more explanation about what the function does, what kinds of inputs it requires, and what it returns.

Function examples

R comes with a set of base functions for descriptive statistics, which provide good examples of how functions work and why they are valuable.

We can use the same vector as the input for all of these functions:

x <- c(4,6,3,2,NA,8,9,5,6,1,9,2,6,3,0,3,2,5,3,3)  # note the NA

mean() has been explained above.

result <- mean(x,na.rm=TRUE)
result
[1] 4.210526

median() returns the median value in the supplied vector:

result <- median(x,na.rm=TRUE)
result
[1] 3

sd() returns the standard deviation of the supplied vector:

result <- sd(x,na.rm=TRUE)
result
[1] 2.594416

summary() returns a vector that describes several aspects of the vector’s distribution:

result <- summary(x,na.rm=TRUE)
result
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
  0.000   2.500   3.000   4.211   6.000   9.000       1 

Exercises

Sock survey
You conducted a survey of your peers in this class, asking each of them how many pairs of socks they own. Most of your peers responded, but one person refused to tell you. Instead of giving you a number, they told you, “Nah!”.

Your results look like this:

# People you asked
peers <- c('Keri','Eric','Joe','Ben','Matthew','Jim')

# Their responses
socks <- c(6,8,14,1,NA,4)

1. Calculate the average pairs of socks owned by your peers.

2. Calculate the total pairs of socks owned by cooperative students in this class.

3. Use code to find the name of the person who refused to tell you about their socks.

4. Use code to find the names of the people who were willing to cooperate with your survey.

5. Use code to find the name of the person with the most socks.

6. Use code to find the name of the person with the fewest socks.

 

Age survey

7. Create a vector named years with the years of birth of everyone in the room.

8. What is the average year of birth?

9. What is the median year of birth?

10. Create a vector called ages which is the (approximate) age of each person.

11. What is the minimum age?

12. What is the maximum age?

13. What is the median age?

14. What is the average age?

15. “Summarize” ages.

16. What is the range of ages?

17. What is the standard deviation of ages?

18. Look up help on the function sort().

19. Created a vector called sorted_ages. It should be, well, sorted ages.

20. Look up the length() function.

21. How many people are the group?

22. Create an object called old. Assign to this object an age (such as 36) at which someone should be considered “old”.

23. Create an object called old_people. This should be a boolean/logical vector indicating if each person is old or not.

24. Is the seventh person in ages old?

25. How many years from being old or young is person 12?

 

Rolling the dice

26. Look up the help page for the function sample().

Here’s an example of how this function works. This line of code will sample a single random number from the vector 1:10.

sample(1:10,size=1)
[1] 2

This command will draw three random samples:

sample(1:10,size=3)
[1]  1  6 10

27. Use this function to simulate the rolling of a single die.

28. Now use this to simulate the rolling of a die 10 times. (Note: look at the replace input for sample().)

29. Now use this to roll the die 10,000 times, and assign the result to a new variable. (Note: look at the replace input for sample().)

30. Look up the help page for the function table().

31. Use the table() function to ask whether the diel you rolled in question 29 is fair, or if it is weighted or biased toward a certain side. Can you describe what the table() function is doing?

30. Now use the sample() function to solve a different problem: your friends want to order take out from the Tavern, but no one in your group of 4 wants to be the one to go pick it up. Write code that will randomly select who has to go.