Module 10 Vectors

Learning goals

  • Learn the various structures of data in R
  • How to work with vectors in R

 

Data belong to different classes, as explained in the previous module, and they can be arranged into various structures.

So far we have been dealing only with variables that contain a single value, but the real value of R comes from assigning entire sets of data to a variable.

The simplest data structure in R is a vector. A vector is simply a set of values. A vector can contain only a single value, as we have been working with thus far, or it can contain many millions of values.

Declaring and using vectors

To build up a vector in R, use the function c(), which is short for “concatenate”.

x <- c(5,6,7,8)
x
[1] 5 6 7 8

Whenever you use the c() function, you are telling R: ‘Hey, get ready. I’m about to give you more than one value at once.’.

You can use the c() function to concatenate two vectors together:

x <- c(5,6,7,8)
y <- c(9,10,11,12)
z <- c(x,y)
z
[1]  5  6  7  8  9 10 11 12

You can also use c() to add values to a vector:

x <- c(5,6,7,8)
x <- c(x,9)
x
[1] 5 6 7 8 9

You can also put vectors through logical tests:

x <- c(1,2,3,4,5)
4 == x
[1] FALSE FALSE FALSE  TRUE FALSE

This command is asking R to tell you whether each element in x is equal to 4.

Instructor tip:

One way to demonstrate this concept: Ask a single student whether they are 22 years old (ask them to answer TRUE or FALSE). Then ask the room the same question. Each student will respond TRUE or FALSE. This is the same as comparing a single value to a long vector.

You can create vectors of any data class (i.e., data type).

x <- c("Ben","Joe","Eric") 
x
[1] "Ben"  "Joe"  "Eric"
y <- c(TRUE,TRUE,FALSE)
y
[1]  TRUE  TRUE FALSE

Note that all values within a vector must be of the same class. You can’t combine numerics and characters into the same vector. If you did, R would try to convert the numbers to characters. For example:

x <- 4
y <- "6"
z <- c(x,y)
z
[1] "4" "6"

Math with two vectors

When two vectors are of the same length, you can do arithmetic with them:

x <- c(5,6,7,8)
y <- c(9,10,11,12)
x + y
[1] 14 16 18 20
x - y
[1] -4 -4 -4 -4
x * y
[1] 45 60 77 96
x / y
[1] 0.5555556 0.6000000 0.6363636 0.6666667

What happens when two vectors are not the same length?

Well, it depends. If one vector is length 1 (i.e., a single number), then things usually work out well.

x <- 5
y <- c(1,2,3,4,5,6,7,8,10)
x + y
[1]  6  7  8  9 10 11 12 13 15

In this command, the single element of x gets added to each element of y.

Another example, which you already saw above:

a <- c(1,2,3,4,5)
b <- 4
a == b
[1] FALSE FALSE FALSE  TRUE FALSE

In this command, the single element of b gets compared to each element of a.

However, when both vectors contain multiple values but are not the same length, be warned: wonky things can happen. This is because R will start recycling the shorter vector:

a <- c(1,2,3,4,5)
b <- c(3,4)
a + b
[1] 4 6 6 8 8

As this warning implies, this doesn’t make much sense. The command will still run, but do not trust the result.

Functions for handling vectors

We are about to list a bunch of core functions for working with vectors. Think of this like a toolbag. Each tool has a specific purpose and limited value: you can’t quite build a house with just a hammer. But when you learn how to use all of the tools in your tool bag together, you can build almost anything. But you have to know how to use each tool individually first.

length() tells you the number of elements in a vector:

x <- c(5,6)
length(x)
[1] 2
y <- c(9,10,11,12)
length(y)
[1] 4

The colon symbol : creates a vector with every integer occurring between a min and max:

x <- 1:10
x
 [1]  1  2  3  4  5  6  7  8  9 10

seq() allows you to build a vector using evenly spaced sequence of values between a min and max:

seq(0,100,length=11)
 [1]   0  10  20  30  40  50  60  70  80  90 100

In this command, you are telling R to give you a sequence of values from 0 to 100, and you want the length of that vector to be 11. R then figures out the spacing required between each value in order to make that happen.

Alternatively, you can prescribe the interval between values instead of the length:

seq(0,100,by=7)
 [1]  0  7 14 21 28 35 42 49 56 63 70 77 84 91 98

rep() allows you to repeat a single value a specified number of times:

rep("Hey!",times=5)
[1] "Hey!" "Hey!" "Hey!" "Hey!" "Hey!"

You can also use rep() to repeat each element of a vector a set number of times:

rep(c("Hey!","Wohoo!"),each=3)
[1] "Hey!"   "Hey!"   "Hey!"   "Wohoo!" "Wohoo!" "Wohoo!"

head() and tail() can be used to retrieve the first 6 or last 6 elements in a vector, respectively.

x <- 1:1000
head(x)
[1] 1 2 3 4 5 6
tail(x)
[1]  995  996  997  998  999 1000

You can also adjust how many elements to return:

head(x,2)
[1] 1 2
tail(x,10)
 [1]  991  992  993  994  995  996  997  998  999 1000

sort() allows you to order a vector from its smallest value to its largest:

x <- c(4,8,1,6,9,2,7,5,3)
sort(x)
[1] 1 2 3 4 5 6 7 8 9

rev() lets you reverse the order of elements within a vector:

x <- c(4,8,1,6,9,2,7,5,3)
rev(x)
[1] 3 5 7 2 9 6 1 8 4
rev(sort(x))
[1] 9 8 7 6 5 4 3 2 1

min() and max() lets you find the smallest and largest value in a vector.

min(x)
[1] 1
max(x)
[1] 9

which() allows you to ask, “For which elements of a vector is the following statement true?”

x <- 1:10
which(x==4)
[1] 4

If no values within the vector meet the condition, a vector of length zero will be returned:

x <- 1:10
which(x == 11)
integer(0)

which.min() and which.max() tells you which element is the smallest and largest in the vector, respectively:

which.min(x)
[1] 1
which.max(x)
[1] 10

%in% is a handy operator that allows you to ask whether a value occurs within a vector:

x <- 1:10
4 %in% x
[1] TRUE
11 %in% x
[1] FALSE

is.na() is a way of asking whether a vector contains missing, broken, or erroneous values. In R, such values are referred to using the phrase NA. When you see NA, think of R telling you, ‘Nah ah! Nope! Not Available!’

x <- c(3,5,7,NA,9,4)
is.na(x)
[1] FALSE FALSE FALSE  TRUE FALSE FALSE

This function is stepping through each element in the vector x and telling you whether that element is NA.

Subsetting vectors

Since you will eventually be working with vectors that contain thousands of data points, it will be useful to have some tools for subsetting them – that is, looking at only a few select elements at a time.

You can subset a vector using square brackets [ ]. Whenever you use you use brackets, you are telling R: ‘Hey, I want some numbers, but not everything: just certain ones.’

x <- 50:100
x[10]
[1] 59

This command is asking R to return the 10th element in the vector x.

x[10:20]
 [1] 59 60 61 62 63 64 65 66 67 68 69

This command is asking R to return elements 10:20 in the vector x.

Instructor tip:

For a change of pace, call out complicated subsetting calculations and ask students to race to call out the correct result first. For example:
Make a vector of all integers, 51 to 151.
What is the 10th element divided by the 3rd element?
What is the seventieth element plus the thirty-first element?
What is the average of the fortieth through sixtieth elements?
Etc.

Exercises

Creating sequences of numbers

  1. Use the colon symbol to create a vector of length 5 between a minimum and a maximum value of your choosing.

  2. Create a second vector of length 5 using the seq() function. Use code to confirm that the length of this vector is 5.

  3. Create a third vector of length 5 using the rep() function. Use code to confirm that the length of this vector is 5.

  4. Finally, concatenate the three vectors and check that the length equals 15.

 

Basic vector math

  1. Create a variable x that is a list of numbers of any size. Create a variable y of the same length.

  2. Check to see if each values of x is greater than each value of y.

  3. Check to see if the smallest value of x is greater than or equal to the average value of y.

 

Vectors and object classes

  1. Create a vector with at least one number, then a second vector with at least one character string, then a third vector with at least one logical value. Identify the class of all three vectors.

  2. Now concatenate these three vectors into a fourth vector. Identify the class of this fourth vector.

 

Heads & tails

  1. Create a vector with at least 15 values.

  2. Show the first six values of that vector using the head() function.

  3. Figure out how to show the same result without a function, but instead with your new vector subsetting skills. Now replicate the tail() function, using those same skills. You may need to call the length() function as well.

 

Shoe sizes

  1. Create a vector called shoes, which contains the shoe sizes of five people sitting near you. Use comments to keep track of which size is whose.

  2. Arrange this set of shoe sizes in ascending order.

  3. Arrange this set of shoe sizes in descending order.

  4. Use code to find the the two largest shoe sizes in your vector. Don’t use subsetting; instead, write a line of code that would work even if more shoes were added to your vector.

  5. What is the shoe size is closest to the mean of these shoe sizes?

  6. Use the which() function to figure out which of your five neighbors this shoe size belongs to.

 

Swimming timelines

  1. Now create a new vector called swim_days, which contains the number of days since those same five people last went swimming (in any body of water; estimating the days since is fine).

  2. Use code to ask whether anyone went swimming less than five days ago.

  3. Which of your neighbors, if any, went swimming in the last month?

  4. Which of your neighbors, if any, have not been swimming the last month?

  5. On average, how long has it been since these people have gone swimming?

 

Dealing with NAs

  1. Create a vector named x with these values: c(4,7,1,NA,9,2,8).

  2. Use a function to decide whether or not each element of x is NA.

  3. Use another function to find out which element in x is NA.

  4. Write code that will subset x only to those values that are NA.

  5. Write code that will subset x only to those values that are not NA.

 

Sleep deficits

  1. Now create a vector called sleep_time with the number of hours you slept for each day in the last week.

  2. Check if you slept more on day 3 than day 7.

  3. Get the total number of hours slept in the last week.

  4. Get the average number of hours slept in the last week.

  5. Check if the total number of hours in the first 3 days is less than the total number of hours in the last 4 days.

  6. Now create an object named over_under. This should be the difference between how much you slept each night and 8 hours (ie, 1.5 means you slept 9.5 hours and -2 means you slept 8 hours).

  7. Write code to use over_under to calculate your sleep deficit / surplus this week (ie, the total hours over/under the amount of sleep you would have gotten had you slept 8 hours every night).

  8. Write code to get the minimum number of hours you slept this week.

  9. Write code to calculate how many hours of sleep you would have gotten had you sleep the minimum number of hours every night.

  10. Write code to calculate the average of the hours of sleep you got on the 3rd through 6th days of the week.

  11. Write code to calculate how many hours of sleep you would get in a year if you were to sleep the same amount every night as the average amount you slept from the 3rd to 6th days of the week.

  12. Write code to calculate how many hours of sleep per year someone who sleeps 8 hours a night gets.

  13. How many hours more/less than the 8 hours per night sleeper do you get in a year, assuming you sleep every night the average of the amount you slept on the first and last day of this week?

  14. What is your total sleep deficit for the last week?

  15. How many more hours per night, on average, do you need to sleep for the rest of the month so that, by the end of the month, you have a sleep deficit of zero?