Scraping Hurricane Harvey Flood Data Using Python

Scraping Hurricane Harvey Flood Data Using PythonAs is often said in data science, 80% of the work is the less-than-glamorous task of data acquisition and data wrangling, with the remaining 20% left for the “sexy” analytics and machine learning tasks. Often, even those numbers are generous — Personally, I spend an inordinate amount of time on these tedious data wrangling tasks. If you work in data science, you’ve likely felt the same pain.For a recent project, I needed to collect data on flooding in my hometown of Houston, Texas, so I built a web scraper to get data from the Harris County Flood Warning System. In an effort to help alleviate some of the tedious data science woes mentioned above, I’ve released my code on Github for others interested in flood research. You can find the link to that repo at the link below. The rest of this post will show a couple quick examples of using the code.** Github Repository for Flood Data Web Scraper **Project BackgroundFor my project, I was specifically looking for time series data of water depths during large-scale flooding events in Houston (e.g. Hurricane Harvey in 2017). The local county emergency management division has developed the “Harris County Flood Warning System (FWS)”.Stream Gage Locations on the Harris County Flood Warning System (FWS) WebsiteThe FWS obtains its data from stream gauges maintained by the US Geological Survey (USGS). While there is a provision on each stream gauge site to download data as an Excel file, I needed a lot more data — so I built a webs scraper. You can clone the scraper from this repo.Using the ScraperAfter cloning the repo, install the requirements listed in the ‘requirements.txt’ file. Using the scraper is simple:Code snippet for querying stream gauge data from the FWS websiteData can also be downloaded into a JSON file:The ‘elevation’ field contains a record of the depth of the water above the stream bottom. The ‘depth’ field is a calculated field that records the height of the water above the Top of Bank (TOB). If the depth of water is above the TOB, then the ‘aboveBank’ key holds the value ‘True’, otherwise it is ‘False’. See the readme file for more details.{ "sensorMetaData": { "sensorId": "519", "sensorType": "USGS Radar", "installedDate": "3/27/1984", "topOfBankFt": 32.0 }, "sensorData": [ { "timestamp": "8/26/2017 7:31 AM", "elevation": { "value": 21.69, "units": "ft" }, "depth": { "value": 0, "units": "ft" }, "aboveBank": false }, { "timestamp": "8/27/2017 5:43 AM", "elevation": { "value": 39.49, "units": "ft" }, "depth": { "value": 7.490000000000002, "units": "ft" }, "aboveBank": true } ]}Example of Querying and PlottingHere’s a short code snippet to grab data from stream gauge number 475 and plot the data:This produces the following plot…Plotting the data using matplotlib… which matches the FWS plot, as we’d expect:Data as shown at https://www.harriscountyfws.orgFinal Wrap-UpHopefully this scraper will be useful to other aspiring flood researchers. If you have any questions or issues using the code, please don’t hesitate to reach out!

Leave a Reply