How to use Machine Learning and Quilt to Identify Buildings in Satellite Images

How to use Machine Learning and Quilt to Identify Buildings in Satellite ImagesJared Yamaoka was an Insight Data Science Fellow in the Summer 2017..During his time at Insight, Jared built a machine learning model that used satellite images of Austin, TX to measure change in land use over time..This project was a proof of concept for the Insight Data Fellows Program.Images from the Landsat 8 satellite are readily available from several sources, including Amazon Web Services (AWS)..Clouds are the bane of satellite image analyses.Austin, TXFor this project, I also used the Quilt package manager..To follow along this tutorial you can check out my data package with all the images and labels you need to get started.$ quilt install jared/landuse_austin_txFrom here you can browse the package in Python:>>> from import landuse_austin_tx>>> landuse_austin_tx<PackageNode '/quilt_packages/jared/landuse_austin_tx'>imageslabelsmetadataI provide the original images as well as images cropped to a region around the city..LC80270392014022LGN00_B4>>> r =, as_grey=True)More information about the available image bands is here.Feature EngineeringThe data was loaded and some higher level features were engineered to help better train the classifier..The feature is very powerful and gives ~10% improvement in accuracy over a classifier trained with out it.Normalized Difference Vegetation IndexThe second feature I call a “building finder” is designed to find edges in the image, and is known in image processing lingo as edge detection.Each RGB band is blurred with a gaussian filter and then subtracted from the original image.Difference of Gaussian ImageData LabelsThe labels for the training came from Open Street Maps which provides crowd sourced land use data..I believe more training across different images should help mitigate this effect, but more study is needed.ConclusionsI have shown that I can successfully use ML to classify land use on a per image basis..The example mentioned in the Modeling section was trained on 10% of the data, but still achieved 83% accuracy (total) and 58% recall (buildings) across the entire image.While the time series analysis will require cross image training or perhaps more feature tuning, the per image performance is reasonable.. More details

Leave a Reply