I was playing around recently with the Social Security baby names historical data set. You can download it either at the state or national level, and it goes back well over a hundred years. It lists all names and frequencies except for any where the number is under 5. Those are filtered for privacy reasons. I used the national data for this exercise, which comes as a set of csv files (one for each year). Then I unioned all the years together into one data file running from 1900 to 2016. I wrote a simple python script for that step, but any number of data prep tools could handle it (Alteryx, RStudio, etc.).
So what all can we do with this data set? For starters, we can build out trend lines to understand changes in name popularity over time. My first crack at a viz was a small multiples plot of the top 300 for each gender grouped into sets of 25. Feel free to explore the rankings or download the workbook from the Tableau Public version.
But what if we want to search for individual names in an exploratory view or, better yet, enable the viewer to compare alternate spellings of the same names?
It’s easy enough to build a line chart view with a wildcard match filter to search by name, but the wildcard filter only lets you search for one thing at a time. A multi-select dropdown list is far too long and cluttered. That’s where a regular expression can help. I use regex quite a bit in python web scraping, and Tableau made them available in calculated fields starting in version 9. In regex syntax, a pipe delimiter (|) is effectively an OR operator, so pairing this with a string parameter and a calculated field will enable the user to search for multiples.
After building my initial line chart view, I set up a simple string parameter called “regex Match.”
Then comes the key calculated field tying that parameter to Name. We’ll set this to “True” in the filters pane:
Let’s walk through how this works. The REGEXP_MATCH expression just looks to match a string to a pattern. In this case the string is Name, and the pattern is my string parameter. The point of the ‘.*’ piece is that it appends a “zero or more of any character” match to the start of my string parameter. That way if the parameter is empty, the user will still see lines in the chart. Otherwise, an empty parameter wouldn’t return True.
The second REGEXP_MATCH under the OR clause just accounts for capitalization. This way the user can key a name with the first letter capitalized or not, and the filter will still work either way. Now let’s try it out and compare a name with two spellings like Bryan and Brian, and we’ll direct the user to use the pipe delimiter in the search field header.
Success! Note that these are fully functional regular expressions, so if the user wanted an exact match on Bryan or Brian, they could key the search as “Bryan$|Brian$” to declare that we only want those names where n is the last character matched.
Feel free to explore the data on your own.