Module 40 Working with dates & times

Learning goals

  • Be able to read dates, and convert objects to dates
  • Be able to convert dates, extract useful information, and modify them
  • Use date times
  • Gain familiarity with the lubridate package

 

Hadley Wickham’s tutorial on dates starts with 3 simple questions:

  • Does every year have 365 days?
  • Does every day have 24 hours?
  • Does every minute have 60 seconds?

"I’m sure you know that not every year has 365 days, but do you know the full rule for determining if a year is a leap year? (It has three parts.)

You might have remembered that many parts of the world use daylight savings time (DST), so that some days have 23 hours, and others have 25.

You might not have known that some minutes have 61 seconds because every now and then leap seconds are added because the Earth’s rotation is gradually slowing down.

Dates and times are hard because they have to reconcile two physical phenomena (the rotation of the Earth and its orbit around the sun) with a whole raft of geopolitical phenomena including months, time zones, and DST.

This chapter won’t teach you every last detail about dates and times, but it will give you a solid grounding of practical skills that will help you with common data analysis challenges."

The lubridate() package

First, install the lubridate package.

install.packages('lubridate')
library(lubridate)

Getting familiar with the date type

Get today’s date:

today <- today()
str(today)
 Date[1:1], format: "2021-09-29"

This looks like a simple character string, but it is not. There are all sorts of date-time calculations in the background.

To demonstrate this, let’s bring in a simple string:

my_birthday <- '1985-11-07'
str(my_birthday)
 chr "1985-11-07"

Note that class type impacts what you can do with text. The following causes an error…

today - my_birthday

… but this does not:

my_birthday <- as_date(my_birthday)
today - my_birthday
Time difference of 13110 days

The datetime class

When you are working with a datetime object, you can add and subtract time to it.

n <- now() 
n
[1] "2021-09-29 14:03:20 CDT"

Add or substract seconds:

n + seconds(1)
[1] "2021-09-29 14:03:21 CDT"

Add or subtract hours:

n - hours(5)
[1] "2021-09-29 09:03:20 CDT"

Simplify to just the date:

as_date(n)
[1] "2021-09-29"

See how time flies:

later <- now()
later
[1] "2021-09-29 14:03:20 CDT"

Common tasks

Converting to dates from strings

The lubridate package was built to handle dates of various input formats. The following functions convert a character with a particular format into a standard datetime object:

ymd("2017-01-31")
[1] "2017-01-31"

This also works if the single-digits dates are not padded with a 0:

ymd("2017-1-31")
[1] "2017-01-31"

Other formats can also be handled:

ydm("2017-31-01")
[1] "2017-01-31"
mdy("January 31st, 2017")
[1] "2017-01-31"
dmy("31-Jan-2017")
[1] "2017-01-31"

Extracting components from dates

Let’s practice extracting information from the following datetime object:

datetime <- ymd_hms("2016-07-08 12:34:56")
year(datetime)
[1] 2016

Get the month:

month(datetime)
[1] 7

Get the day of month:

mday(datetime)
[1] 8

Get the day of year:

yday(datetime)
[1] 190

Get the day of week:

wday(datetime)
[1] 6

Get the name of the day of week:

weekdays(datetime)
[1] "Friday"

Get the hour of the day:

hour(datetime)
[1] 12

Get the minute of the hour:

minute(datetime)
[1] 34

Get the seconds of the minute:

second(datetime)
[1] 56

Dealing with time zones

When working with dates and times in R, time zones can be a major pain, but the lubridate package tries to make this simpler.

Adjust timezones for dates:

# Today's date where I am
today()
[1] "2021-09-29"

# Today's date in New Zealand
today(tzone='NZ')
[1] "2021-09-30"

Adjust time zones for date-times:

# Time where I am
now()
[1] "2021-09-29 14:03:20 CDT"

# Time in UTC / GMT (which are synonymous)
now('UTC')
[1] "2021-09-29 19:03:20 UTC"

now('GMT')
[1] "2021-09-29 19:03:20 GMT"

Don’t know what time zone your computer is working in? Use this function:

Sys.timezone()
[1] "America/Chicago"

To get a list of time zones accepted in R, use the function OlsonNames() (there are about 500 options):

OlsonNames() %>% head(50)
 [1] "Africa/Abidjan"       "Africa/Accra"         "Africa/Addis_Ababa"  
 [4] "Africa/Algiers"       "Africa/Asmara"        "Africa/Asmera"       
 [7] "Africa/Bamako"        "Africa/Bangui"        "Africa/Banjul"       
[10] "Africa/Bissau"        "Africa/Blantyre"      "Africa/Brazzaville"  
[13] "Africa/Bujumbura"     "Africa/Cairo"         "Africa/Casablanca"   
[16] "Africa/Ceuta"         "Africa/Conakry"       "Africa/Dakar"        
[19] "Africa/Dar_es_Salaam" "Africa/Djibouti"      "Africa/Douala"       
[22] "Africa/El_Aaiun"      "Africa/Freetown"      "Africa/Gaborone"     
[25] "Africa/Harare"        "Africa/Johannesburg"  "Africa/Juba"         
[28] "Africa/Kampala"       "Africa/Khartoum"      "Africa/Kigali"       
[31] "Africa/Kinshasa"      "Africa/Lagos"         "Africa/Libreville"   
[34] "Africa/Lome"          "Africa/Luanda"        "Africa/Lubumbashi"   
[37] "Africa/Lusaka"        "Africa/Malabo"        "Africa/Maputo"       
[40] "Africa/Maseru"        "Africa/Mbabane"       "Africa/Mogadishu"    
[43] "Africa/Monrovia"      "Africa/Nairobi"       "Africa/Ndjamena"     
[46] "Africa/Niamey"        "Africa/Nouakchott"    "Africa/Ouagadougou"  
[49] "Africa/Porto-Novo"    "Africa/Sao_Tome"     

At some point you may have reason to force the timezone of a datetime object to change without actually changing the date or time. To do so, use the function force_tz():

# Get current time in UTC/GMT
n <- now('UTC')
n
[1] "2021-09-29 19:03:20 UTC"

# Change timezone to Central Standard Time without changing time: 
force_tz(n,tzone='America/Chicago')
[1] "2021-09-29 19:03:20 CDT"

Using timestamps instead

One way to avoid timezone issues is to convert a datetime object to a numeric timestamp.

Timesetamps record the number of seconds that have passed since midnight GMT on January 1, 1970. It doesn’t matter which timezone you are standing in; the seconds that have passed since that moment will be the same:

# Time where I am
now() %>% as.numeric()
[1] 1632942201

now('UTC') %>% as.numeric()
[1] 1632942201

Timestamps can simplify things when you are doing a lot of adding and substracting with time. Timestamps are just seconds; they are just numbers. So they are much less of a black box than datetime objects.

You can always convert from a timestamp back into a datetime object:

# Convert to timestamp
ts <- now() %>% as.numeric()
ts
[1] 1632942201

# Convert back to datetime object
ts %>% as_datetime()
[1] "2021-09-29 19:03:20 UTC"

Exercises

Creating datetime objects

Use the appropriate lubridate function to parse each of the following dates:

1. January 1, 2010

2. 2015-Mar-07

3. 06-Jun-2017

4. c('August 19 (2015)', 'July 1 (2015)')

5. 12/30/14

 

Extracting datetime components

Work with this vector of dates:

dt <- c('2000-01-04 03:43:01',
        '2007-09-29 12:18:59',
        '2011-04-16 19:51:16',
        '2015-12-13 21:24:48',
        '2020-06-01 06:39:02')

6. Create a dataframe that has the following columns:

  • raw (containing the original string)
  • year
  • month
  • dom (day of month)
  • doy (day of year)
  • hour
  • minutes
  • seconds

7. Now add two more variables:

  • timestamp
  • diff (the difference, in days, between this time and midnight GMT on January 1, 1970)

 

Record of a child’s cough

First, download the data:

coughs <- read_csv('https://raw.githubusercontent.com/databrew/intro-to-data-science/main/data/coughs.csv')

8. Create a dow (day of week) column.

9. Create a date (without time) column.

10. How many coughs happened each day?

11. Create a chart of coughs by day.

11. Look up floor_date. Use it to get the number of coughs by date-hour.

12. Create an hour variable.

13. Use the hour variable to create a night_day column indicating whether the cough was occurring at night or day.

14. Does this child cough more at night or day?