Module 48 for loops

Learning goals

  • What for loops are, and how to use them yourself
  • How to use for loops to carry out repetitive analyses
  • How to use for loops to summarize subgroups in your data
  • How to use for loops to create and work with many data files at once
  • How to use for loops for plots that are tricky but cool
  • How to use nested for loops

Basics

A for loop is a super powerful coding tool. In a for loop, R loops through a chunk of code for a set number of repititions.

A super basic example:

Here’s an example of a pretty useless for loop:

This code is saying:
- For each iteration of this loop, step to the next value in x (first example) or 1:5 (second example).
- Store that value in an object i,
- and run the code inside the curly brackets. - Repeat until the end of x.

Look at the basic structure:
- In thefor( ) parenthetical, you tell R what values to step through (x), and how to refer to the value in each iteration (i).
- Within the curly brackets, you place the chunk of code you want to repeat.

Another basic example, demonstrating that you can update a variable repeatedly in a loop.

Silly example 1:

Silly example 2:

“Nested” for loops:

for loop workflow

Loops can be simple or complex, but the procedure for building any for loop is the same. The general idea is to write the body of your loop first, test it to make sure it works, then wrap it in a for loop. Use the code below as a template for building for loops.

for loop exercises

Use case 1: Repetitive printing

1a. Practice using the for loop template to make your own version of silly example 1.

1b. Practice the for loop template to make your own version of silly example 2.

1c. Pretend you are doing a big repetitive analysis with 1,000 iterations. Pretend each iteration takes a long time to process, so it would be nice to print a status update each time an iteration is complete. Write a for loop that prints a status update with each iteration (e.g., “Iteration 3 out of 1,000 is complete …”).

Use case 2: Self-building calculations

2a. Create a vector with these values: 45, 245, 202, 858, 192, 202, 121. Build a for loop that prints the cumulative sums for this vector. (If your vector is 1,1,3, then the cumulative sums are 1,2,5.)

2b. Modify this for loop so that the cumulative sums are saved to a second vector object, instead of printed to the console.

(Note: there is a built-in function, cumsum(), that you can also use for this application)

Use case 3: Summarizing subgroups in your data

Scenario: You participate in a survey of flightless birds in the forests of New Zealand. You conduct thirty days of fieldwork on four species of bird: the kiwi, the weka, the kakapo (the world’s heaviest parrot), and the kea (the world’s only alpine parrot).

Download the data, place it in your working directory, and read it into your R session.

Your data (nz_birds.csv) look like this:

Each row contains the data for a single bird group detection.

Your supervisor has asked you to write a report of your findings. In that report she wants to see a table with the number of each species seen on each day of the fieldwork. That table will look something like this:

  day kiwi weka kakapo kea
1   1    5    6      3   3
2   2    1    3      1   2
3   3    2    4      5   3
4   4    2    3      2   0
5   5    6    6      3   2
6   6    6    2      3   6

Use a for loop to create this table.

Note that this is a very common use case for for loops. Other examples of this use case include these scenarios:

  • You want to summarize sample counts for each day of fieldwork.

  • You want to summarize details for each user in your database.

  • You want to summarize weather information for each month of the year.

Use case 4: Repetitive file creation

Scenario, continued: Your supervisor also wants to be able to share a public version of the New Zealand survey data with some of her collaborators. Rather than share the raw data, she would like to have a separate data file for each species of bird. Use a for loop to create a data file (.csv format) for each bird species.

Hint: First, set your working directory. Then, within your working directory, create a folder where you can deposit the files you create.

Use case 5: Reading in multiple files

In the previous use case, you divided your original dataset into several files. Now see if you can write a for loop that reverses the process. In other words, build a for loop that combines several data files into a single dataframe.

Hint: Recall that you can use the function rbind() to combine two or more dataframes.

Use case 6: Layering cyclical data on a plot

First, read in some cool data (keeling-curve.csv).

This is the famous Keeling Curve dataset: long-term monitoring of atmospheric CO2 measured at a volcanic observatory in Hawaii.

Try plotting the Keeling Curve:

There are some erroneous data points! We clearly can’t have negative CO2 values. Let’s remove those and try again:

What’s the deal with those squiggles? They seem to happen every year, cyclically. Let’s investigate!

Let’s look at the data a different way: by layering years on top of one another.

To begin, let’s plot data for only a single year:


# Stage an empty plot for what you are trying to represent
plot(1, # plot a single point
     type="n",
     xlim=c(0,365),xlab="Day of year",
     ylim=c(-5,5),ylab="CO2 anomaly")
abline(h=0,col="grey") # add nifty horizontal line

# Reduce the dataset to a single year (any year)
kcy <- kc[kc$year=="1990",] ; head(kcy)
    year month day_of_month day_of_year year_dec frac_of_year    CO2
816 1990     1            7      6.4970 1990.018       0.0178 353.58
817 1990     1           14     13.5050 1990.037       0.0370 353.99
818 1990     1           21     20.5130 1990.056       0.0562 353.92
819 1990     1           28     27.4845 1990.075       0.0753 354.39
820 1990     2            4     34.4925 1990.094       0.0945 355.04
821 1990     2           11     41.5005 1990.114       0.1137 355.09

# Let's convert each CO2 reading to an 'anomaly' compared to the year's average.
CO2.mean <- mean(kcy$CO2,na.rm=TRUE) ; CO2.mean  # Take note of how useful that 'na.rm=TRUE' input can be!
[1] 354.4538

y <- kcy$CO2 - CO2.mean ; y # Translate each data point to an anomaly
 [1] -0.87384615 -0.46384615 -0.53384615 -0.06384615  0.58615385  0.63615385
 [7]  0.96615385  0.72615385  1.13615385  1.33615385  1.08615385  1.67615385
[13]  1.81615385  1.71615385  1.77615385  2.41615385  2.50615385  3.24615385
[19]  2.79615385  2.87615385  2.92615385  2.52615385  1.79615385  1.72615385
[25]  1.33615385  1.76615385  0.53615385 -0.16384615 -0.08384615 -0.46384615
[31] -1.28384615 -0.99384615 -1.37384615 -2.65384615 -3.29384615 -3.59384615
[37] -2.70384615 -2.99384615 -3.05384615 -2.91384615 -2.88384615 -2.72384615
[43] -2.05384615 -1.74384615 -1.30384615 -1.00384615 -0.76384615 -0.55384615
[49]  0.01615385 -0.11384615  0.37615385  0.34615385          NA

# Add points to your plot
points(y~kcy$day_of_year,pch=16,col=adjustcolor("darkblue",alpha.f=.3))

But this only shows one year of data! How can we include the seasonal squiggle from other years?

Figure out how to use a for loop to layer each year of data onto this plot. Your final plot will look like this:

So how do you interpret this graph? Why do you think those squiggles happen every year?

Other use cases for plots

Using for loops to plot subgroups of data

for loops are also useful for plotting data in tricky ways. Let’s use a different built-in dataset, that shows the performance of various car make/models.

Let’s say we want to see how gas mileage is affected by the number of cylinders a car has. It would be nice to create a plot that shows the raw data as well as the mean mileage for each cylinder number.

Now try to do something similar on your own with the airquality dataset. Use for loops to create a plot with Month on the x axis and Temperature on the y axis. On this plot, depict all the temperatures recorded in each month in the color grey, then superimpose the mean temperature for each month.

We will provide the empty plot, you provide the for loop:

Review assignments

Review assignment 1

Sometimes you need to summarize your data in such a specific way that you will need to use nested for loops, i.e., one for loop contained within another.

For example, your supervisor for the New Zealand Flightless Birds Survey has now taken an interest in associations among the four bird species you have been monitoring. For example, are kiwis more abundant on the days when you detect a lot of kakapos?

To answer this question, your supervisor wants to see a table with each species combination (Kiwi - Kakapo, Kiwi - Weka, … Kakapo - Kea, etc.) and the number of dates in which both species were seen more than 5 times.

You can produce this table using a nested for loop. Here is how it’s done:

Note that the code for adding the results to the staged objects X, A, and B is contained within the second for loop. This is necessary for producing our results; if we put that code in the first for loop after the code for the nested loop, our results would not be complete.

Note that each for loop must use a different variable to represent each iteration. In this example, the first loop uses i and the second uses j. If we used i for both loops, R would get very confused indeed.

Also note that we used i and j in the variables specific to each loop (e.g., dayi and dayj), as a simple way to help us keep track of what each variable is representing.

Review assignment 2

Your supervisor is happy with your pairwise species association dataframe, and wants to use it in an analysis for a publication. However, the R package she wants to use requires that the data be in the format of a square matrix with four rows – one for each species – and four columns. Like this:

You have not yet worked with matrices in this curriculum (you will in a few modules), but for now think of them as simple dataframes with a single type of data (e.g., all numeric values, like this one). You can subset matrices just as you would a dataframe: matrix[row,column].

The values in this matrix should represent the number of dates in which each species pair was seen 5 times or more. For example, result[1,2] would be 3, since the Kiwi and Kea were seen 5+ times on only 3 dates.

She asks you to use the dataframe you just created to create this matrix. Use a nested for loop to do it.

Boom!

Review assignment 3

First, read in and format some other cool data (renewable-energy.csv). The code for doing so is provided for you here:

This dataset, freely available from World Bank, shows the renewable electricity output for various countries, presented as a percentage of the nation’s total electricity output. They provide this data as a time series.

Multi-pane plots with for loops
Practice with a single plot

Task 3C: First, get your bearings by figuring out how to use the df dataset to plot the time series for the United States, for the years 1990 - 2015. Label the x axis “Year” and the y axis “% Renewable”. Include the full name of the county as the main title for the plot.

Now loop it!

Task 3D: Use that code as the foundation for building up a for loop that displays the same time series for every country in the dataset on a multi-pane graph that with 4 rows and 3 columns.

Now loop it in layers!

Task 3E: Now try a different presentation. Instead of producing 12 different plots, superimpose the time series for each country on the same single plot.

To add some flare, highlight the USA curve by coloring it red and making it thicker.