Data itself is rarely the story, and a dashboard is only as useful as the questions it can answer. With that in mind, I try to start any data visualization by defining those key research questions. In my professional career, this process usually involves conversations with business users about problems they want to solve, or if I’m building a visualization in Tableau Public, it’s usually because there was a specific data angle of a topic that interested me.
For this visualization, we know that interest in college basketball is largely focused on the annual NCAA tournament. As such, many of the pertinent questions will center around bracket-filling research from people eager to get an edge in their pools. These questions include:
- Which teams are stronger or weaker than their seeds indicate?
- What are the head-to-head win probabilities of games in both the earlier and later rounds?
- Which teams have momentum coming into the tournament?
- Which teams have the more accurate shooters? Which are weaker at free throws? (take it from a Memphis fan…it matters)
- Which teams are most reliant on three-point shooting?
- How have the various seeds performed historically?
Once I have an idea of the research questions, I’ll often go to pencil and paper to start sketching out the views that I think can answer those questions. This process of sketching is helpful in data visualization on a couple of levels. It allows me to evaluate treatments for a visualization before I go to the trouble of shaping data sets, and once I’m ready to shape the data set I can structure it with certain views in mind. This enables me to exclude unnecessary data and calculate as many of the metrics as I can before bringing the data into Tableau. That tends to lead to better responsiveness and speed in the visualizations. Despite my intent to preplan as much as I can, I always end up iterating and editing quite a bit, and most people who work with Tableau Public can relate to the process of overwriting the old version repeatedly as you figure out ways to fine tune the views.
For the first dashboard, I thought a team profile or resume view could answer many of the questions related to playing style, lineups, and game performance heading into the tournament. If you’ve played fantasy sports, the player card is a standard convention where you hover or click next to a player’s name to get key summary information about the player. In the NCAA visualization I built last year, I had long drop list parameters of 68 teams in some of the views. I abandoned that approach this time around in favor of a selectable heatmap. This way a viewer can quickly browse teams by region and seed, and selecting a team will filter the other 5 worksheets in the dashboard. The trick is configuring the filter action as Run on Single Select Only.
For the win probabilities views, I was again looking for a way to avoid lengthy drop list selectors, and Joe Mako had suggested in our Alteryx conversation that it would make sense to precompute all the Log5 probabilities and then visualize them in more of a heatmap/grid format. I liked that idea, but I needed a logical way to filter the view, as many of the possible matchups (e.g., 11 seeds on opposite sides) are irrelevant to show. My solution was to break the probability views into two dashboards – a region-level dashboard and a Final Four dashboard. I felt like this division of tasks could be analogous to the mental process many people go through when filling out a bracket if they’re solving it from the outside-in, as the Final Four has always felt like a distinct and separate phase of the tournament relative to the game-heavy first week.
In the Final Four view, I incorporated a trended Log5 time series chart that I’d wanted to build since I started this project, a basketball equivalent of Predictwise.com’s election probability charts. By restricting this dashboard to the top 5 seeds in each region, it makes the parameter lists more manageable, and it puts the focus on games that tend to be closer, harder to pick, and more consequential for tournament pools. The time series chart helps illustrate momentum heading into the tournament, as I’m operating on the belief that younger teams will tend to get better over the course of the year. The logic driving the trended view is a table calculation of Log5 computed for any two teams that are selected. The calculation works because the key data extract out of Alteryx contains pythagorean win percentage as a running value for every team from the first to last day of the season, so if shifts up and down as the team wins and loses games.
The slope graph on dashboard 4 is a Tableau trick I picked up from a Ben Jones blog post a couple of years ago, and this chart type is featured on the cover of Alberto Cairo’s book, The Functional Art. Slope graphs are ideal for showing a delta and rank between common pairs of data. In this case, I took the NCAA S-Curve of 68 teams and compared it to the ranking created by my pythagorean metrics. The resulting view allows you to quickly evaluate where the selection committee misseeded teams, and I incorporated some parameters and a quick filter to allow the viewer to isolate by seed or degree of misseeding.
The history view on dashboard 5 is more of an appendix to remind us of just how unpredictable the results can be. 2008 was the only year where all the top seeds advanced, and years like 2006 and 2011 saw 11 seeds make it to the Final Four.
One final note on Story Points navigation – it took me a little while to realize that the big gray boxes up top could be sized down into smaller numbered nav links. Now that I know, I much prefer it. The numbered approach feels intuitive, and I think it keeps the viewer’s focus on the dashboard content as opposed to the Story Points navigation up top. You can always use dashboard headers and annotations as needed to summarize key information on each view.