Closed Captioning Closed captioning available on our YouTube channel

R tip: Creating color-coded calendars

InfoWorld | Jan 11, 2019

See how well you’re achieving daily goals with a color-coded calendar made in R

Similar
Hi. I’m Sharon Machlis at IDG Communications, here with Do More With R: Color-Coded Calendars.
One of the best ways to see if you’re meeting a daily goal is a simple calendar with color-coded squares. Did you meet a daily business metric like sales or social-media posts? Or, how are you doing with personal goals, like daily exercise?
R can help. Let’s take a look at a calendar that tracks daily exercise. More specifically, whether you did cardio, strength training, or rest.
First you need to get the data before you can visualize it. For simple data entry, I usually use spreadsheets, either Excel or Google Sheets. I know, I know, a bit of heresy coming from an R enthusiast. But I like to use an optimized tool for the job.
Here’s how you might set up an Excel file for daily exercise.
A column for the day, and a column for the activity. What I don’t want to do is free-form enter text into the activity column. I may think I’ll remember that the exact format for strength training is “strength training” and not “weights”. But I want to be sure. So, I’ll add data validation to my activity column.
I set up a column of acceptable options in another tab. See, I’ve got Cardio and Strength training. Next, I’ll select the cells where I want to restrict data entry– in this case the whole Activity row except for the header.
Then I’ll choose Data Validation in the Excel Data Ribbon and pick a LIST
Finally, I’ll click inside the Source field and then go to my possible activities tab to select my 2 choices and then hit OK. Now my spreadsheet is set up correctly. If I want to add another entry, I’m given two choices:
And finally it’s time for some R.
To make an easy color-coded calendar, I’ll use the ggplot2 library and the ggcal package by Jay Jacobs. Ggcal is on GitHub. You can see the code to install the package here. I’ll also load dplyr, because I almost always end up using dplyr, whatever I’m doing; readxl to read the spreadsheet; and lubridate to work with dates.
So let’s import the file and take a look at it. The first column of dates comes in as POSIX. The ggcal function wants Date objects. So, I’ll change my Day column to date objects.
OK that’s better.
I’ve only got a few days of the month here. If I want the entire month’s calendar to print, I’ll need to fill in the rest of the dates for the current month. That’s what the rest of this code does.
max(daily_exercise$Day) gets the last date in my file.
The as.Date(cut()) code finds the first day of the month for that last day in the file – that would be January 1 for any date in January --sets it to be a Date class, and then adds 1 month so the result is February 1 for any date in January. I don’t want February 1st, though. I want 1 day earlier than that. So I subtract 1 (which means 1 day). And then I’ve got the end of the month.
Next, I want to calculate all dates starting with the earliest date in my data, and ending with the end of the month that we just calculated. I can use base R’s Date sequencing command, creating a sequence by 1 day. I’m going put that in a new data frame with one column.
Why did I make a data frame of 1 column instead of a vector? Because now I can use a dplyr left_join to combine the 2 data frames. Left join means keep everything in the left, or first, data frame and merge it with the second data frame by a common column. I’ll run that merging by Day. Now my data is ready for ggcal.
The syntax for the ggcal function is dates as the first argument, and values as the second argument. The values can be categories, like we’re using now, or numbers if you want a calendar heat map. I’ll run this and you get the default calendar with default colors.
If you want to set your own color scheme, you can use the usual ggplot2 functions. Here I used scale_fill_manual() and added a legend name, color values for each category, and a lighter grey color for NA values. That last theme line adds back a title for the legend.
I’ve got another worksheet that includes minutes in addition to categories, so you can see a calendar heatmap. Let’s process that. The first block of code reads in the new sheet, creates a new data frame with all days of the month, and runs ggcal for minutes with the default colors.
I’d rather have the darkest color for the highest number, not the lowest. And I’d like a lighter grey for the empty blocks, If you know how to customize ggplot2, you can easily tweak your calendar. In this last block of code, I use RColorBrewer’s scale_fill_distiller function to use a Color Brewer palette for continuous, numerical data; in this case yellow to orange to red. And, there’s the calendar with a new color scheme.
That’s it for this episode, thanks for watching! For more R tips, head to the Do More With R page at https go dot infoworld dot com slash more with R, all lowercase except for the R. You can also find the Do More With R playlist on YouTube. Hope to see you next episode!
Popular
Featured videos from IDG.tv