Module 43 Conditional statements

Learning goals

  • Understand what conditional statements are, and why they are so awesome.
  • Be able to write your own conditional statements in R.
teacher_content('Here is some teacher content.')

First steps

An example of a conditional statement is, “If ______ happens, do _____. Otherwise, do _____.”

In R code, conditional statements work a similar way: they let a variable’s value determine which process to carry out next.

Here is a basic example:

x <- 4

if(x==3){
  message("x is equal to 3!")
}else{
  message("x does NOT equal 3!")
}

Let’s break this if statement down.

  • The if command opens up a conditional statement.
  • The parenthetical (x==3) is where the logical test happens. If the result of this test is TRUE, then the if statement will be processed; if not, the else statement will be run instead.
  • The curly brackets, { } serve to contain the code that will be run, depending on the outcome of the logical test.
  • The else command indicates the start of the code that will be run if the logical test’s result is FALSE.

This code ran the logical test x==3, determined its outcome to be FALSE, and so it skipped the if code and ran the else code instead.

Here is another example of the same idea:

x <- 4

if(x==3){
  y <- 100
}else{
  y <- 0
}

y # see what y is
[1] 0

Since x==3 returned FALSE, y was defined as 0 instead of 100.

Note that you do not need to define an else statement. If you do not, R will simply do nothing if the logical test is FALSE.

x <- 4

if(x==3){
  message("x is equal to 3!")
}

(Nothing happened)

You can also feed the if statement a logical object instead of a test.

x <- 4
x_test <- x == 3
x_test # check out the value of x_stest
[1] FALSE

if(x_test){
  message("x is equal to 3!")
}

Not that, since x_test is a logical value (TRUE or FALSE), you do not need to write out a logical test within the if parenthetical. But you are free to do so if you wish:

x <- 4
x_test <- x == 3

if(x_test == FALSE){
  message("x is not equal to 3!")
}

This if statement is saying that, if it is TRUE that x_test is FALSE, print a message saying so.

Exercise 1

Write out your own basic if...else statement and ensure that it works.

Next steps

Nested conditions

You can nest conditional statements within others as many times as you wish:

x <- 6

if(x==3){
  message("x equals 3")
}else{
  if(x < 3){
    message("x is less than 3")
  }else{  
    message("x is greater than 3")
  }
}

Note that every open bracket, {, needs a corresponding closing bracket ,}. Most errors associated with if statements involve missing brackets.

Handling NAs, NaN’s, Infs, and NULLs

if statements can be particularly helpful when your dataset contains missing or broken values. R includes base functions that help you carry out logical tests concerning missing values. These three test below can be very helpful within if statements.

is.finite() tests whether a numeric object is a real, finite number.

x <- 0
is.finite(x)
[1] TRUE

y <- 10/x
y
[1] Inf
is.finite(y)
[1] FALSE

Here is an if statement making use of is.finite():

x <- 0
y <- 10/x
if(is.finite(y)){
  message("y is indeed a finite number!")
}else{
  message("y is not a finite number!")
}

is.na() tests whether a variable contains a missing or broken object.

x <- "eric"
is.na(x)
[1] FALSE

y <- as.numeric(x) # try converting `x` to a numeric value
is.na(y)
[1] TRUE
is.finite(y)
[1] FALSE

NA stands for “Not Available”. NaN is also a common sign of a broken value. It stands for Not a Number.

is.null() tests whether a variable is empty. That is, it has been initiated, but it contains no data.

x <- c(4,7,3,1,5) # make a vector of numbers
is.null(x)
[1] FALSE

y <- c() # make an empty vector
is.null(y)
[1] TRUE

Joint conditions

if statements can also accommodate joint logical tests. For example, the follow if statement only returns if a message if two tests are true:

x <- 6
if(is.finite(x) & x==6){
  message("x is real and equals 6")
}

The next if statement returns a message if either of the logical statements are true:

x <- 6
if(x==3 | x==6){
  message("x is equal to either 3 or 6")
}

NULL as a default

Setting the default value for an input to NULL can be useful in certain use cases. For example, let’s say that if b is not defined by the user, you want its value to be set to five times the value of x.

my_function <- function(x,a=1.3,b=NULL){
  
  # Handle input `b`
  if(is.null(b)){
    b <- x*5
    print(paste("b was NULL! Setting its value to ", b))
  }
  
  # Now perform process
  y <- a*x + b
  
  return(y)
}

In this function, a conditional statement is used – if(is.null(b)){ ... } – to handle the input b when the user does not specify a value for it. When b is NULL, the logical test is.null(b) will be TRUE, which will trigger the conditional statement and case b to be defined as x*5. Conditional statements will be convered in detail in the next modules.

Try running the function with and without providing a value for b.

my_function(x=2,b=5)
[1] 7.6
my_function(x=2)
[1] "b was NULL! Setting its value to  10"
[1] 12.6

Conditional statements such as if(is.null(x)){ ... } or if(is.na(x)){ ... } will be helpful in dealing with all the possible values that a user can pass to your custom functions.

Complex inputs

You can pass vectors, dataframes, and any other data structure as inputs in your own custom functions. For example:

my_input <- 1:20
my_function(x=my_input,a=1.2,b=10)
 [1] 11.2 12.4 13.6 14.8 16.0 17.2 18.4 19.6 20.8 22.0 23.2 24.4 25.6 26.8 28.0
[16] 29.2 30.4 31.6 32.8 34.0

Complex function outputs

At some point you will want multiple objects to be returned by your function. For example, perhaps you want both y and b to be returned now that you can define b according to the value of x.

Unfortunately, the return() command does not let you include multiple objects. return(y,b) will not work. To make it work, you have to place your output objects within a single object, such as a vector, dataframe, or list.

Here is a modification of my_function() that allows multiple outputs:

my_function <- function(x,a=1.3,b=NULL){
  
  # Handle input `b`
  if(is.null(b)){
    b <- x*5
  }
  
  # Now perform process
  y <- a*x + b
  
  output <- c("y"=y,"b"=b) 
  return(output)
}

Now my_function() works like this:

my_function(x=5)
   y    b 
31.5 25.0 

To get the value of just y or just b, you can treat the output just like any other vector:

my_function(x=5)[1]
   y 
31.5 
my_function(x=5)[2]
 b 
25 

Exercise 2

Write a nested if statement that produces a message reporting the hemisphere for any GPS position you provide it (a pair of latitude and longitude coordinates, in decimal degrees). The four hemisphere options are as follows:

  • Northwest (positive latitudes and negative longitudes, e.g., USA)
  • Northeast (positive latitudes and longitudes, e.g., Russia)
  • Southwest (negative latitudes and longitudes, e.g., Brazil)
  • Southeast (negative latitudes and positive longitudes, e.g., New Zealand).

Include the ability to handle missing values (i.e., if an NA is provided, return a message saying that values are missing and the hemisphere cannot be determined.)

Provide five examples that demonstrate the functionality for all the possible message options.

Review assignment

Note: This exercise will combine all the skills you’ve learned for for loops, if statements, and writing functions into a real-world data science scenario. Buckle up!

You are working at the Center for Disease Control. Your supervisor has asked you to take a look at state-level data on infectious diseases within the United States in the last century. Specifically, she wants you to address the following questions and requests:

  1. In the last 90 years, which states have had the highest average prevalence of measles, pertussis (whooping cough), and smallpox in proportion to their population sizes? Which have had the lowest?

  2. Provide beautiful plots showing trends in the prevalence of these diseases over the last century. Produce a single plot for each state, with three lines representing the time series for each disease of interest. Save each plot as a pdf into a folder named state-level-summaries. Name each pdf using the state’s name.

To do this work, your supervisor asks you to use the us_contagious_diseases dataset contained within the R package dslabs, which contains disease data from 1928-2011 for all states. To make sure the numbers reflect actual patterns, she asks you to only use prevalence numbers for years in which counts were made in more than 20 weeks out of the year.