I follow Andy Kriebel and Eva Murray’s Makeover Monday series on Twitter where people are encouraged to remake data visualizations from the news. This week’s data source was a meaty one, an export of over 30,000 tweets courtesy of the Trump Twitter Archive.
A big part of doing analysis on publicly available data sources is to first check the existing body of research. This helps avoid reinventing the wheel and can illuminate some opportunities. Data scientist David Robinson published an analysis in August 2016 showing that Trump tweets originating from an Android client were coming from Trump’s personal device whereas tweets originating from iPhone were coming from Trump’s staff. Robinson then used sentiment analysis to show how the Android tweets tended to be the angrier ones, whereas the staff tweets were more measured.
Buzzfeed published a couple of articles in December 2016 containing visualizations of Trump Tweets. The earlier piece visualized news sources Trump shared as a set of bubble clusters. Later that month, they published a similar set of circles – arranged in a grid this time – showing accounts Trump has retweeted. That second article was the focus for the remake challenge.
Let’s start with what I like and don’t like about the Buzzfeed view:
I like that it’s a timely and relevant data source, as Trump’s use and misuse of Twitter has relevance to the news cycle ahead of his inauguration. I think it’s useful to rank his retweets as a way to shine light on which people and organizations Trump trusts.
As to what I don’t like, circles aren’t the ideal design choice for comparing volume. As a general rule, it’s easier and more intuitive to compare rectangles. The problem with circles is it’s unclear whether I should be comparing diameter or area, and even if I knew to compare area, it would be the same difficulty as calculating instantly the difference in surface area between a 16″ large pizza and a 14″ medium. Encoding volume to circle size is more appropriate in cases where precision isn’t critical, as in Hans Rosling’s animated scatter plots. In Rosling’s visualizations, the X and Y axis positions are more important to know than the exact deltas between the circle sizes, which serve as more of a general indicator for big, small, and medium-sized countries.
Buzzfeed’s visualization also lacks context around time, and since Twitter communicates information via a timeline, that angle seems important. The main story I got from the Buzzfeed piece is that Trump retweets many different accounts, and they’re all over the map in terms of types and credibility. This much seems in keeping with Trump’s hair-trigger thumbs and short attention span, but it leaves me thinking that there’s much more going on in these numbers.
For my remake, I went with a static infographic format, which allows me to highlight a few key findings from the Twitter data. I enjoy Vox’s “charts that explain” series where they post a set of visualizations that add context to a complex topic, so I used this as a model and went for a tall and skinny dashboard, Trump’s Campaign Twitter Usage in 6 Charts.
Digital analytics is my primary area of professional focus, which means I spend a sizable portion of my waking hours swimming in timestamped web activity data. When I first approach a data set like this, the early exploration questions tend to be things like how much traffic is there?, how did it change over time?, when is volume highest and lowest?, and where did the traffic originate? These can be answered with simple line graphs and heatmap grids.
One go-to I use all the time is a day-of-week/hour-of-day heatmap that can usually provide some useful information about where activity is concentrated during the week. But when I built it with Trump tweets, it looked like there might be a “no tweets during lunch” rule in the Trump org.
Remember that it’s not Tableau’s job to tell you whether an answer makes sense, and in my judgment there’s just no way that Trump failed to tweet between noon and 1pm for 7+ years. On the other hand, I doubt that Trump would increase his activity from the 11pm hour to the midnight hour only to decrease it again at 1am. A right click and “view data” into one of those “hour 0” cells is all we have to do to see that the missing noon-1pm tweets were all landing in the midnight hour due to a glitch in the DATEPARSE calculated field contained in extract.
My fix for this was to rebuild the timestamps for each tweet using a slightly different calculation, and I used an additional dimension to split the heatmap into AM/PM so that it’s easier to read. Since we know Trump tweets from his Android, I computed the Trump tweets as a negative sum of records with the others as a positive sum. That way we can use a diverging gradient to show the pattern difference between Trump and his handlers. Trump is orange (naturally), and we can see how we focuses his Twitter usage in the morning, evening, and weekends relative to his staff, which tend to focus their tweeting during weekday business hours.
The next issue I wanted to tackle was one of scope. Clay Shirky once said, “There’s no such thing as information overload, only filter failure.”
The Buzzfeed piece felt a bit scattered, and oftentimes the interesting narratives are hidden within smaller subsets of data. I set a data source filter going back to June 2015, the month Trump launched his campaign. I was interested in understanding how (or if) his behavior changed over the course of the campaign, and defining a narrower scope to the workbook made it easier to do this. This still leaves over 18 months of data and 8,000 tweets to work with.
One clear change was the shift away from Trump-authored tweets as the campaign neared its end, although Trump seemed to temper his usage only by modest amounts. The bigger difference was how the iPhone and web client postings surged in frequency, especially in October when they outnumbered Trump’s own tweets by about 4:1. The balance has since shifted back to Trump-authored tweets, which if history is any guide, means we’re in for more of the angry Trump.
If Trump keeps his Android as POTUS and continues the recent trend of direct control on the Twitter feed, you may also see an increase in mentions of Russia, China, and Mexico at the expense of India, and you may see an increase in retweets of controversial figures like Ann Coulter and the Drudge Report as compared to retweets of Trump’s daughter, Ivanka. Trump’s Android client has only ever retweeted Ivanka two times, once in January 2014 to re-announce a Golf Digest article about Donald Trump and again in January 2015 in reference to the Celebrity Apprentice.