Click for the giant, full-resolution version.
I’ve been wanting to learn R and Illustrator for a while now, as it’s the standard toolkit in data journalism circles for producing of high-quality print data visualization. FiveThirtyEight, for example, uses R and Illustrator heavily in their workflow, as does the New York Times. The core idea is that R excels at processing the data and creating the vector shapes, and then Illustrator takes over for text and layout.
Our Jacksonville-area AIGA (American Institute of Graphic Arts) holds an annual music-themed poster show that was a perfect opportunity to try this approach. The parameters of the contest were to select a song and create an 18×24 poster, which in my case would include a large data visualization. I selected Queen’s Bohemian Rhapsody, partly because I like the band and the song, and partly because its length and unorthodox structure give it a distinct look when encoded as a visualization.
I started with a 23-page pdf sheet music score that included all of the vocal, string, piano, and drum parts. From there, I encoded every note as a data point, and then used R to draw those notes as bar segments. In modern contexts, this gantt chart convention can be seen on MIDI software like Apple’s Garage Band or the Guitar Hero video game series. The convention predates the digital age, though, going back at least as far as player piano rolls at the turn of the 20th century.
Image of a Player Piano Roll:
Encoding the Data
The first step in building the visualization was the most tedious, encoding the song as data in Google Sheets. I was a music minor in college, and it helped that I can read sheet music. Every note in the score has a pitch, a start time, a duration, and a voice (i.e., lead vocal, rhythm guitar, etc.). That’s the core of what I’d need, although I wanted the timeline to run across seconds of run-time on the x-axis, not beats of a measure.
One tricky aspect is that the song has a time change around the 3-minute mark, as it shifts to the double-time opera section, and then it drops back to the original tempo at the end. To make a conversion to seconds work, I counted each 8th note as a time unit of 1 for the opening up through the opera section. When it shifts to double-time, I switched to counting quarter notes as 1, and then went back to 8th notes as 1 for the ending. To convert to seconds, I just divided my time unit by the number of those time units per second, which ends up being a constant of 0.422274, which I arrived at by dividing the number of seconds of runtime at the start of the opera section by the number of 8th notes elapsed, or 182/431.
For the pitches, I encoded all the tones in standard piano tuning, so middle C (C4) is 40, and each half step above or below is an increment of 1. I decided to omit the drum parts, since they don’t have tonality in the traditional sense, and I was less confident that I could correctly interpret the drum parts in the sheet music. Even without the drums, the csv file of all the vocal, piano, and string parts was over 3,800 rows of data.
Processing the Data in R
My R code was only 9 lines of script. First I called the ggplot2 and ggthemes libraries. ggplot2 is the best known R library for data visualization, and the themes library provides additional options for adding color. The third line reads the csv file as a variable b.
I wanted to break the piano, vocal, and string parts into three sections, so the fourth line defines the order with vocal on top, piano in the middle, and the string parts on the bottom. The fifth line adjusts the End Time down by 0.1 units to create a small amount of spacing between notes. The sixth and seventh lines then convert the start and end times from time units to seconds. I multiplied my time units by (182/431), which again is the conversion constant based on when the opera section starts.
The 8th line creates the base ggplot, telling R how to draw the plot and where to scale the lines. I scaled the horizontal lines with a spacing of 12 instead of 10 because there are 12 tones in an octave, and the vertical break points are at increments of 60 for each minute.
The 9th line breaks plot into three facets and adds a color theme. I used darkunica as the background, and I defined a manual color scale for each individual part. Here is the full script:
b <- read.csv(“BR_1.csv”)
b$voice_order <- factor(b$Instrument, levels = c(“Vocal”,”Piano”,”Strings”))
b$EndTime_adj <- b$EndTime-0.1
b$start_sec <- b$StartTime * (182/431)
b$end_sec <- b$EndTime_adj * (182/431)
base <- ggplot(b, aes(x=start_sec, color = Voice)) + geom_segment(aes(xend = end_sec, y=Pitch, yend = Pitch), size=.8) +ylab(“”) +xlab(“Time (Seconds)”) + scale_x_continuous(breaks = seq(0,365, by=60)) + scale_y_continuous(breaks = seq(0,70, by=12))
base + facet_wrap(~ voice_order, ncol=1) + theme_hc(bgcolor=”darkunica”) + scale_colour_manual(values = c(“Bass”=”darkorange”,”Guitar_1″=”chartreuse”,”Guitar_2″=”orangered”,”Piano”=”aliceblue”,”Vocal_2″=”cyan”,”Vocal_FM”=”yellow”))
And here’s the basic plot it creates. The image below is a raster version, but R can export this as a vector file for consumption in Illustrator.
Putting it All Together in Illustrator
Once I had the data visualization, I supplemented the graph with explanatory text and background on the song to tie the piece together as an infographic. Hat tip to a few graphic designer friends (Brad O’Donnell, Andrew Wolson, Bryan Hunt, and others) for the Illustrator tips and some ideas like eliminating the default gray headers for Vocal, Piano, and Strings, instead shifting that to vertically aligned text on the left side of the visualization. This freed up quite a bit of room for additional sections of text. My final tasks all came down to doing additional research on the song to help me understand the critical information about it and build it all into a story.