Scraping and visualizing 117 years of hurricane data, Part 1

I live in Jacksonville, FL, and as of this writing most of the state’s residents are anxiously watching a monster hurricane named Irma execute a slow turn off the Cuban coast and toward the Keys, Naples, and Tampa. The west side of the state appears likely to bear the brunt of the storm, but Irma will likely bring wind and rain to every part of America’s third most populous state. I’m fortunate in that I don’t live in an area of Jacksonville that’s evacuating, but it’s been sobering to watch the steady stream of cars moving up northbound I-95 all week from my downtown office overlooking the interstate.

When a hurricane takes aim at Florida, every resident turns into an amateur meteorologist. Checking the NOAA’s National Hurricane Center website becomes the new waking morning ritual. We study the forecasts and charts. We follow the classics like the “cone of uncertainty” along with newer innovations like wind probability charts. I don’t know the official name for these but have taken to calling them Blowtorch Charts.

Before I get to the data, I feel compelled to mention how major hurricanes like Irma are a reminder of why we should support and fully fund the National Weather Service, whose stated mission is to “provide weather, water, and climate data, forecasts and warnings for the protection of life and property and enhancement of the national economy.” Who could disagree with that? Quite a few Republicans apparently. In 2005, Senator Rick Santorum tried to pass a bill restricting the information National Weather Service could provide to the public, arguing it was unfair to private weather forecasting companies. Not surprisingly, the head of Pennsylvania-based Accuweather, a major campaign donor, had been pushing the idea. Santorum’s bill died after Katrina hit New Orleans when enough Senators concluded that it wasn’t a good time to legislatively block the NWS from doing its job. More recently, the Trump administration has targeted the NOAA, parent to the NWS, with a 16% budget cut. This is part of a pattern to try and block, muzzle, or defund government-sponsored climate research, which inevitably operates in some of the same swim lanes as weather data collection.

Politics aside, scientists at the NWS want you to access and study weather data, and the rest of this post is on how to programmatically download a bunch of historical hurricane data. In the Part 2 post I’ll take you through some data visualization approaches for hurricane data using Tableau.

If you just want to download the data without going to the trouble of scraping it or seeing the code, here are the two files. Both are pipe-delimited .txt files.

  1. AtlanticHurricaneList
  2. AtlanticHurricanePaths

You can download NOAA data from their website, but sometimes I find it’s easier to scrape the data from a different site, Weather Underground. Wunderground has been around since the mid-90s, and I can remember as a teenager figuring out how to embed their weather widget on a Geocities page. The first Python scraper I ever ran was a tutorial from Nathan Yau’s book, Visualize This, pulling temperature values from a series of Wunderground pages. This method I’m using isn’t all that different from Yau’s approach, and it leans heavily on the libraries Requests and BeautifulSoup.

Wunderground’s hurricane archive has a couple of types of pages. One is a year index page like this:

https://www.wunderground.com/hurricane/at2005.asp

And linked from there are the storm-specific pages like this:

https://www.wunderground.com/hurricane/atlantic/2005/Major-Hurricane-Katrina

That storm page has the data used to generate a map visualization at the top. That’ll be a set of 20-40 observations from the hurricane’s genesis through dissipation. Each has a time and date stamp, latitude and longitude, wind speed, barometric pressure, and Saffir-Simpson value.

I want the crawler to create two data files to correspond to the two page types. One will be a smaller file at the storm-level of granularity, and the other will have all that path-level information. The beauty of python is that your computer doesn’t mind doing mind-numbing, repetitive tasks like loading a couple thousand pages and copying down information.

Here’s the basic crawler with notes on what it’s doing at Github along with a second version that corrects for some gaps in the hurricane lists for 2014 and 2015. Unfortunately those two years are missing some storms at wunderground for whatever reason, and I ended up just pulling the data from NOAA storm reports and merging it back into the text files.

Once the data’s scraped, we can start building visualizations and figuring out simple trends like the change in named storms over time. More on this in Part 2.

 

Scraping and visualizing 117 years of hurricane data, Part 1