Building support for pollution-free cities: an Open Data workflow

In this post, we’ll walk through the steps to build a data-driven advocacy tool using Python code.

Inspired by a political strategy that paid remarkable dividends in London, we’ll demonstrate how similar tools could be built for New York or other cities that invest in open data portals and monitoring of pollutants like PM2.


Motivation: From Great Smog to great clean-upPersonally I had always associated air pollution with China’s eastern seaboard, until I learned that London — my hometown — experienced perhaps the most destructive smog episode in history.

In December 1952, Londoners burned more coal than usual due to a cold snap, just as a low-pressure and windless conditions settled over the city, trapping the pollutants at ground level.

Traffic ground to a halt as the thick yellow smog (a “pea-souper”) curbed visibility.

At least 12,000 people died.

Left: The Great Smog of London (1952) killed up to 12,000 people.

It has been cited by China’s leadership as a worst case scenario justifying stringent actions like limiting auto ownership.

Right: air pollution outside the Forbidden City, snapped on my Jan 2018 China trip.

On recent trips to London, I was struck by how prominently air pollution features in newspapers, radio and TV.

This media context has allowed three successive mayors — Ken Livingstone, Boris Johnson and Saddiq Khan — to ratchet up ambitious measures affecting automobiles, industry and even log fires.

The coverage has been based around one thing: framing air pollution as a children’s health issue.


Get schools and pollution dataFollowing the same approach but for New York City, let’s acquire and overlay the city’s school locations with data from the NYC Community Air Survey (NYCCAS).

Both are accessible from NYC Open Data.

NYCASS is a great model of how cities can get actionable information on pollution at a reasonable cost.

Starting in 2008, city workers installed portable air quality monitors at 150 sites, for two weeks at a time, once per season.

Through land use regression, NYC Department of Environmental Protection was able to estimate pollution levels across the city without the cost of permanent sensor installations.

Here we import the two datasets and plot them together, using the key Python libraries for vector data (Geopandas) and raster data (Rasterio).

Check the source code for several data cleaning steps that were required.

NYCCAS air pollution raster for 2012 combined with Shapefile of New York City Public SchoolsLeft: Installing air quality sensor on Fifth Avenue.

Right: the Community Air Survey built the case for measures like NYC’s ban on heavy fuel oil, which previously heated boilers at 10,000 apartment buildings.


Sample raster values at each locationThe NYCCAS data is a set of set of rasters in TIFF format.

Each pixel gives the expected level of PM2.

5 in a 300 x 300 meter grid square of the city.

Rasterstats is a Python library with a helpful point_query function, used below.

We create a time series of estimated PM2.

5 concentration for each school by sampling each raster — I used the rasters for seven years — at each school’s location.

The result is stored as a Pandas dataframe (the Python equivalent of an Excel spreadsheet).

I also merged in census data on poverty and the city’s health outcomes survey.

DataFrame of schools with PM2.

5 levels sampled from the rasters (years 1 through 3 shown).


Analysis 1: Does pollution differ across boroughs?The World Health Organisation considers PM2.

5 levels above 10 micrograms per square meter to be dangerous.

Below we plot the number of elementary schools that experience differing levels of PM2.

5 with the WHO’s threshold is in red.

A story is emerging: most elementary schools in the Bronx are comfortably below the threshold, but the other boroughs have many schools close to or above the danger zone.

Next, the same plot for secondary schools.

Take care and do your homework before sending your teenager to school in Manhattan.

Has the situation worsened or improved?.Plotting the time series of average PM2.

5 concentrations of schools in each borough shows clear improvements since 2009; although many kids are still close to or above dangerous pollutant levels.


Analysis 2: How does air quality relate to poverty?Too often, poor people face disproportionate impacts from environmental pollution.

For each school we take the median income of the surrounding census tract, plotting this against the latest year’s PM2.

5 estimate.

Interestingly, there’s no apparent relationship between poverty and air pollution in New York.

In London, press coverage has been driven in part by studies showing that both schools in both poor and rich areas — including some of the city’s most prestigious places of education — suffer hazardous pollution levels.

New York appears to follow a similar pattern: rich neighborhoods like the Upper East Side and Prospect Heights are among those with dense emissions from buildings and vehicles.

The pattern is likely to differ where industrial emissions plays a stronger role in the pollution mix, but in two cities at least, air pollution is an equal opportunities killer.


Putting it together: data visualizationOur shapefile combining school locations, their demographic characteristics and a time series of PM2.

5 exposure is ripe for data visualization.

What makes for a successful advocacy tool in this subject area?.In our case, the key ingredient is parental concern.

Below is a prototype built with Python’s Folium library that color-codes each Brooklyn school by pollution level: schools exceeding the WHO threshold are dark red.

We want parents to be able to zoom in on their child’s school, see the air quality, and get angry.

Few things move the needle in City Hall more than parents demanding action for their child’s health.

Click here for a proof of concept built with Tableau Public.

The NYC schools dataset includes the name and phone number of each Principal, so try hovering over a school, and if you’re particularly concerned, give them a call to suggest site improvements or no-idling regulations!Bringing open data portals and air pollution monitoring to more cities will enable this kind of civic activism elsewhere — after all, few things unites human beings more than concern for their children’s health.

Thanks for reading!.Check out the source code here.

If you’d like to continue to conversation, get in touch or leave a comment.

.. More details

Leave a Reply