Module 11 Calling functions
- Understand what functions are, and why they are awesome
- Understand how functions work
- Understand how to read function documentation
You have already worked with many
R functions; commands like
unique() are all functions. You know a command is a function because it has parentheses,
(), attached at its end.
Just as variables are convenient names used for calling objects such as vectors or dataframes, functions are convenient names for calling processes or actions. An
R function is just a batch of code that performs a certain action.
Variables represent data, while functions represent code.
Most functions have three key components: (1) one or more inputs, (2) a process that is applied to those inputs, and (3) an output of the result. When you call a function in
R, you are saying, “Hey
R, take this information, do something to it, and return the result to me.” You supply the function with the inputs, and the function takes care of the rest.
Take the function
mean(), for example.
mean() finds the arithmetic mean (i.e., the average) of a set of values.
In this command, you are feeding the function
mean() with the input
Perhaps this analogy will help: When you think of functions, think of vending machines: you give a vending machine two inputs – your money and your snack selection – then it checks to see if your selection of choice is in stock, and if the money you provided is enough to pay for the snack you want. If so, the machine returns one output (the snack).
Base functions in
There are hundreds of functions already built-in to
R. These functions are called “base functions”. Throughout these modules, we have been – and will continue – introducing you to the most commonly used base functions.
You can access other functions through bundles of external code known as packages, which we explain in an upcoming module.
You can also write your own functions (and you will!). We provide an entire module on how to do this.
Note that not all functions require an input. The function
getwd(), for example, does not need anything in its parentheses to find and return current your working directory.
Saving function output
You will almost always want to save the result of a function in a new variable. Otherwise the function just prints its result to the Console and
R forgets about it.
You can store a function result the same way you store any value:
Functions with multiple inputs
mean() accepts a second input that is called
na.rm. This is short for
NA.remove. When this is set to
R will remove broken or missing values from the vector before calculating the mean.
If you tried to run these commands with
na.rm set to
R would throw an error and give up.
Note that you provided the function
mean() with two inputs,
na.rm, and that you separated each input with a comma. This is how you pass multiple inputs to a function.
A silly way to remember the
na.rm input is to refer to it as “narm”, as in, “Dont forget to narm, yall”
Note that many functions have default values for their inputs. If you do not specify the input’s value yourself,
R will assume you just want to use the default. In the case of
mean(), the default value for
FALSE. This means that the following code would throw an error …
R will assume you are using the default value for
na.rm, which is
FALSE, which means you do not want to remove missing values before trying to calculate the mean.
Function documentation (i.e., getting help)
Functions are designed to accept only a certain number of inputs with only certain names. To figure out what a function expects in terms of inputs, and what you can expect in terms of output, you can call up the function’s help page:
When you enter this command, the help documentation for
mean() will appear in the bottom right pane of your
Learning how to read this documentation is essential to becoming competent in using
Be warned: not all documentation is easy to understand! You will come to really resent poorly written documentation and really appreciate well-written documentation; the few extra minutes taken by the function’s author to write good documentation saves users around the world hours of frustration and confusion.
Descriptionhelp you understand what this function does.
Usagesection shows you how type out the function.
Argumentssection lists out each possible argument (which in
Rlingo is another word for input or parameter), explains what that input is asking for, and details any formatting requirements.
Valuesection describes what the function returns as output.
At the bottom of the help page, example code is provided to show you how the function works. You can copy and paste this code into your own script of Console and check out the results.
Note that more complex functions may also include a
Details section in their documentation, which gives more explanation about what the function does, what kinds of inputs it requires, and what it returns.
R comes with a set of base functions for descriptive statistics, which provide good examples of how functions work and why they are valuable.
We can use the same vector as the input for all of these functions:
mean() has been explained above.
median() returns the median value in the supplied vector:
sd() returns the standard deviation of the supplied vector:
summary() returns a vector that describes several aspects of the vector’s distribution:
You conducted a survey of your peers in this class, asking each of them how many pairs of socks they own. Most of your peers responded, but one person refused to tell you. Instead of giving you a number, they told you, “Nah!”.
Your results look like this:
1. Calculate the average pairs of socks owned by your peers.
2. Calculate the total pairs of socks owned by cooperative students in this class.
3. Use code to find the name of the person who refused to tell you about their socks.
4. Use code to find the names of the people who were willing to cooperate with your survey.
5. Use code to find the name of the person with the most socks.
6. Use code to find the name of the person with the fewest socks.
7. Create a vector named
years with the years of birth of everyone in the room.
8. What is the average year of birth?
9. What is the median year of birth?
10. Create a vector called
ages which is the (approximate) age of each person.
11. What is the minimum age?
12. What is the maximum age?
13. What is the median age?
14. What is the average age?
16. What is the range of ages?
17. What is the standard deviation of ages?
18. Look up help on the function
19. Created a vector called
sorted_ages. It should be, well, sorted ages.
20. Look up the
21. How many people are the group?
22. Create an object called
old. Assign to this object an age (such as 36) at which someone should be considered “old”.
23. Create an object called
old_people. This should be a boolean/logical vector indicating if each person is old or not.
24. Is the seventh person in
25. How many years from being old or young is person 12?
Rolling the dice
26. Look up the help page for the function
Here’s an example of how this function works. This line of code will sample a single random number from the vector
This command will draw three random samples:
27. Use this function to simulate the rolling of a single die.
28. Now use this to simulate the rolling of a die 10 times. (Note: look at the
replace input for
29. Now use this to roll the die 10,000 times, and assign the result to a new variable. (Note: look at the
replace input for
30. Look up the help page for the function
31. Use the
table() function to ask whether the diel you rolled in question 29 is fair, or if it is weighted or biased toward a certain side. Can you describe what the
table() function is doing?
30. Now use the
sample() function to solve a different problem: your friends want to order take out from the Tavern, but no one in your group of 4 wants to be the one to go pick it up. Write code that will randomly select who has to go.