Module 9 Variables

Learning goals

  • How to define variables and work with them in R
  • Learn the various possible classes of data in R

Introducing variables

So far we have strictly been using R as a calculator, with commands such as:

3 + 5
[1] 8

Of course, R can do much, much more than these basic computations. Your first step in uncovering the potential of R is learning how to use variables.

In R, a variable is a convenient way of referring to an underlying value. That value can be as simple as a single number (e.g., 6), or as complex as a spreadsheet that is many Gigabytes in size. It may be useful to think of a variable as a cup; just as cups make it easy to hold your coffee and carry it from the kitchen to the couch, variables make it easy to contain and work with data.

Declaring variables

To assign numbers or other types of data to a variable, you use the < and - characters to make the arrow symbol <-.

x <- 3+5

As the direction of the <- arrow suggests, this command stores the result of 3 + 5 into the variable x.

Unlike before, you did not see 8 printed to the Console. That is because the result was stored into x.

Calling variables

If you wanted R to tell you what x is, just type the variable name into the Console and run that command:

[1] 8

Want to create a variable but also see its value at the same time? Here’s a handy trick: put your command in parentheses:

(x <- 3*12)
[1] 36

When you do that, x gets assigned a value, then that value is printed to the console.

You can also update variables.

(x <- x * 3)
[1] 108
(x <- x * 3)
[1] 324

You can also add variables together.

x <- 8
y <- 4.5
x + y
[1] 12.5

Naming variables

Here are a few rules:

1. A variable name has to have at least one letter in it. These examples work:

2. A variable name has to be connected. No spaces! It is usually best to represent a space using a period (.) or an underscore (_). Note that periods and underscores can be used in variable names:

my.variable <- 5 # periods can be used
my_variable <- 5 # underscores can be used

However, hyphens cannot be used, since that symbol is used for subtraction.

3. Variables are case-sensitive. If you misspell a variable name, you will confuse R and get an error. For example, ask R to tell you the value of capital X. The error message will be Error: object 'X' not found, which means R looked in its memory for an object (i.e., a variable) named X and could not find one.

4. Variable names can be as complicated or as simple as you want.

5. Some names need to be avoided, since R uses them for special purposes. For example, data should be avoided, as should mean, since both are functions built-in to R and R is liable to interpret them as such instead of as a variable containing your data.

supercalifragilistic.expialidocious <- 5
supercalifragilistic.expialidocious  # still works
[1] 5

So those are the basic rules, but naming variables well is a bit of an art. The trick is using names that are clear but are not so complicated that typing them is tedious or prone to errors.

Note that R uses a feature called ‘Tab complete’ to help you type variable names. Begin typing a variable name, such as supercalifragilistic.expialidocious from the example above, but after the first few letters press the Tab key. R will then give you options for auto-completing your word. Press Tab again, or Enter, to accept the auto-complete. This is a handy way to avoid typos.

Types of data in R

So far we have been working exclusively with numeric data. But there are many different data types in R. We call these “types” of data classes:

  • Decimal values like 4.5 are called numeric data.
  • Natural numbers like 4 are called integers. Integers are also numerics.
  • Boolean values (TRUE or FALSE) are called logical data.
  • Text (or string) values are called character data.

In order to be combined, data have to be the same class.

R is able to compute the following commands …

x <- 6
y <- 4
x + y
[1] 10

… but not these:

x <- 6
y <- "4"
x + y

That’s because the quotation marks used in naming y causes R to interpret y as a character class.

To see how R is interpreting variables, you can use the class() function:

x <- 100
[1] "numeric"
x <- "100"
[1] "character"
x <- 100 == 101
[1] "logical"

Another data type to be aware of is factors, but we will deal with them later.


Finding the errors

1. This code will produce an error. Can you find the problem and fix it so that this code will work?

# Assign 5 to a variable
my_var < 5

2. Same for this one:

# Assign 5 to a variable
my_var == 5

3. Same for this one:

x <- 5
y <- 1
X + y


Your Bananas-to-ICS ratio

4. Estimate how many bananas you’ve eaten in your lifetime and store that value in a variable (choose whatever name you wish). (By the way, what is a good method for estimating this as accurately as you can?)

5. Now estimate how many ice cream sandwiches you’ve eaten in your lifetime and store that in a different variable.
6. Now use these variables to calculate your Banana-to-ICS ratio. Store your result in a third variable, then call that variable in the Console to see your ratio.

7. Who in the class has the highest ratio? Who has the lowest?


Creating boolean variables

8. Assign a FALSE statement of your choosing to a variable of whatever name you wish.

9. Confirm that the class of this variable is “logical.”

10. Confirm that the variable equals FALSE.


Converting Farenheit to Celsius:

11. Assign a variable farenheit the numerical value of 32.

12. Assign a variable celsius to equal the conversion from Fahrenheit to Celsius. Unless your a meteorology nerd, you may need to Google the equation for this conversion.

13. Print the value of celsius to the Console.

14. Now use this code to determine the Celsius equivalent of 212 degrees Farenheit.


Wrapping up

15. Now ensure that your entire script is properly commented, and make sure your script is saved in your datalab working directory before closing.