The longer I work as a data analyst the more I appreciate screen scraping, especially in cases where I’ll need to pull the same data more than once.
The matchup guide for this year’s tournament is live. The method is similar to last year’s dashboard – scraped all the data with python scripts, shaped and computed the metrics in Alteryx, and then visualized the results in Tableau Public.
Using a pair of python scripts, I scraped the user rating distributions of over 34,000 IMDb films stretching from 1915 to February of 2017. This included just under 5.8 billion individual ratings on IMDb’s 1 to 10 scale for all movies in that timespan. Rating distributions can be a useful … Read the rest
For this year’s NCAA rankings, I’ve set up python web crawlers to grab Joe Lunardi’s latest bracket projections as well as game log statistics from sports-reference.com. From there, I built an Alteryx workflow recomputing rankings to reflect margin of victory and to reward stronger performance in recent games. Methodology … Read the rest
I follow Andy Kriebel and Eva Murray’s Makeover Monday series on Twitter where people are encouraged to remake data visualizations from the news. This week’s data source was a meaty one, an export of over 30,000 tweets courtesy of the Trump Twitter Archive.
A big part of doing analysis … Read the rest
Ever wondered what the temperature data progression looks like for a 17-hour pork shoulder or a 15-hour brisket? As a backyard barbecue nerd, I know I have, so I captured the data and visualized the results. Background and process writeup below.
When I was in Austin last fall for the … Read the rest
I’ve been learning the basics of D3.js recently and decided to try my hand at chord diagrams. It’s a niche chart type with limited utility, but for a first foray, I took the chart name literally and plotted the chord-to-chord movements in David Bowie’s Life on Mars. It seemed … Read the rest
I’ll be in Austin Nov 7 – 11 for the 2016 Tableau Conference. For anyone who’s attending, drop by my session on Thursday at noon. Here’s a session explorer visualization based on the csv files Tableau made available. I set up a short python script to grab the latest file, … Read the rest
Click for the giant, full-resolution version.
I’ve been wanting to learn R and Illustrator for a while now, as it’s the standard toolkit in data journalism circles for producing of high-quality print data visualization. FiveThirtyEight, for example, uses R and Illustrator heavily in their workflow, as does the New York … Read the rest
Much of the data analysis at this site highlights how the selection committee seeding process is flawed, and you can use that knowledge to your advantage in making picks. That doesn’t mean you should favor the most probable paths for every section of your bracket. Leave the chalk-filled conservative brackets … Read the rest