Self-Serve Local News

But where a Mad Libs game might call for a verb or noun, the article skeletons might call for the most common cause of hate crimes in State X or the Gini Index in City Y.First we identified nearly 70 possible data features that could be plugged into an article:Then we incorporated those data points into our skeletons, based on which would be most relevant when telling a story about a locality that fell into one of the eight bins on the decision tree..This resulted in the eight final skeletons, one of which started with:The bracketed purple elements are the plugged-in data points, while the green text identifies any copy added onto the base layer..Meanwhile, the bracketed orange features are a secondary type of Mad Libs-style plug-in that calls for more narrative (rather that numerical) details..Because certain locality-specific “values” that one might want to include in an article — such as anecdotes or quotes — don’t (yet) exist in organized databases or spreadsheets, these spots identify to a local reporter where they would still need to go out and do their own reporting..The final output of this system, then, would fill in the purple features with relevant values but leave the orange ones unmodified.Localization in PracticeIf a journalist were writing for a paper in Los Angeles, this two-part approach to localization would result in the following decision tree…… which would lead to the following article skeleton…… which, when the system plugged in the relevant data points, would output the final article…With eight possible article skeletons combined with data on 3020 cities, 44 states and the District of Columbia, the result is a flexible system for automatically generating unique, customized articles that could be syndicated to local outlets across the country with relatively little overhead..Although not every aspect of the process is automated — namely, writing the article skeletons and filling in the orange text features — this process does suggest one possible approach to producing localized articles at scale.Using the JavaScript libraries Mapbox for plotting and D3.js for data visualization, our team developed a web app that makes this system easy to use and accessible for those without a background in computer or data science:By clicking on their city or state, a resource-scarce newsroom could pull up not only a fleshed-out article to work from…… but also useful summary statistics…… locality-specific data visualizations…… and even a succinct tip sheet that, if the article content were not itself necessary, indicates what might still be worth looking into…Expanding the ScopeTo further develop this proof-of-concept, our team decided to apply the same methodology to another article: Ben Casselman’s “Where Police Have Killed Americans In 2015,” another FiveThirtyEight piece..Using data from the U.S..Census Bureau and a data project by The Guardian about police killings, Casselman’s article explores the relationship between poverty, race and where in the country police kill the most civilians.Localizing this narrative followed essentially the same process as before:Create a neutral “base layer” from initial copy.Aggregate data and join them together in dataframe.Create a decision tree that accounts for scale (state or city) and statistical trends (whether police killings per capita and average household income are above or below the national average).Identify Mad Libs-style features (quantitative and qualitative) that can be plugged in, either from the dataset or by local reporters.Modify the base layer to account for the decision tree results (eight possibilities again) and Mad Libs features.Embed this process in a map-based web app.Incorporate additional data visualizations, summary statistics, tip sheets, etc..as available.Make the outcomes accessible to journalists.Relying primarily on summary statistics begs the question of whether this sort of product could scale to account for massive datasets..For instance, when working with the police killings article, we encountered a dataset containing records of every single police killing in the United States over several years..When the user clicks on a specific locality to see the auto-generated article populated with relevant data, a naive approach would involve querying the dataset for all police killings that happened in the specific city and then computing the appropriate information..However, this approach is computationally expensive and inefficient because the same queries and calculations might be performed many times.. More details

Leave a Reply