Visualising Streaming Geospatial Data

Visualising Streaming Geospatial DataJames GrahamBlockedUnblockFollowFollowingApr 18Video of visualizing high-risk credit card transactions on a mapI recently worked on a project that involved streaming credit card transaction data and classifying it based on risk probability.

Building upon this, I wanted to explore the options for visualising the data.

I decided to focus on the geographical aspect since it’s a key component when trying to identify fraudulent transactions… and I’m a bit of a map nerd.

Code: https://github.

com/jgraham0325/streaming-visualizationLive Demo: https://streaming-visualisation.

appspot.

com/Use CasesThere are many reasons why being able to visualise streaming geospatial data could solve real-world problems.

Essentially these boil down to scenarios where it’s important to make a decision soon after an event has occurred.

It could be a person who’s responsible for making the decision or a machine learning algorithm that automates the process.

Some examples:Waze hazardConnected Car: With the increasing amount of sensors in cars and their access to the internet, it’s possible to alert a driver to hazards on the road before they become a danger.

E.

g.

a tree that’s just been blown down on the road ahead.

Azure IoT connected factoryInternet of Things: Predictive maintenance can be enhanced by visualising the location of potential problems and finding the closest supply of replacement parts.

It can also identify patterns or clusters that otherwise wouldn’t be obvious.

FlightRadar24Flight Tracking: Showing real-time flight information can help predict delays, handle irregular operations, and analyse route efficiency.

Real-time fire hazards (Weatherbug)Disaster relief: Collecting actionable GIS data before and after hazardous events, like fires, can help people to avoid dangerous situations.

Technology ChoicesThere are a number of commercial products that offer the ability to display geospatial data in near real-time.

Commercial products includeArcGIS: A desktop or cloud based product by ESRI, who have been a dominant force in commercial mapping applications for decades.

It’s powerful but the licence costs are expensive.

Cesium: Particularly good at visualising 3D data.

It’s used by sites such as FlightRadar24, which is accessed 45 million times a month.

Billed monthly based on storage and usage.

Kinetica: Unique selling point is the combination of GIS with AI/ML.

Zoomdata: Well known for streaming visualisation capabilities but not particularly for GIS.

Able to show data in a wide variety of graphs and combine them with plots on a map.

There are also a number of JavaScript APIs that can be used, for a more DIY approach.

Javascript APIsLeaflet.

js: simple, open source and offers a good library of plugins (including Mapbox JS).

OpenLayers: powerful, open source but more complex than others.

Mapbox GL: good for displaying complex data layers using WebGL.

Google Maps JS API: easy to use but requires a paid licence past a threshold.

Three.

js: Uses WebGL to create 3D graphics in a web browser.

Decision: Use Leaflet.

js since it’s easy to use, flexible and doesn’t incur any licence costs.

A Node.

js server is used along with the Socket.

io library to push real-time events to the map in the client’s browser.

I also implemented some Three.

js visualisations for comparison.

High-level architecturePerformanceShowing hundreds of thousands of points on a map at once is technically challenging and arguably not very useful.

People find it difficult to interpret this data in its raw format.

To address these issues, points are typically aggregated using heatmaps or clusters of points.

Through some trial and error, I’ve found that the performance of these layers can vary widely depending on their implementation.

Filters are also useful and in the case of credit card transactions, only high-risk transactions are shown.

This brings the typical number of credit card transactions down from 8.

9M/hr (based on UK average of 924M a year) to 840/hr (based on average fraud rate of 0.

08% and assuming 9/10 high-risk transactions are false positives)As part of the PoC, the following layers were implemented:Three.

js (WebGL): This is the most resource intensive option since it’s rendering a 3D map of the world.

The only way to make it performant was to limit the number of points displayed simultaneously to less than 100.

Using a 2D WebGL map allowed for thousands of points to be displayed but the resolution was too low to be useful in practice.

3D WebGL heatmap using three.

js.

Flashes of light indicate a new transaction2D WebGL heatmap using three.

jsLeaflet.

heat plugin: This was able to download and render over 10K points in less than 1 sec.

It’s useful for identifying hot spots but doesn’t allow the user to zoom down to see the individual points, which is often essential for identifying potential fraud.

Heatmap with 5,000 high-risk transactions.

Leaflet MarkerCluster plugin: This is the most commonly used plugin for grouping points that are close together to make them manageable on screen.

However, it took 2–3 sec to render 10K points and didn’t work well with streaming data since each time a data point was added, the layer needed to be refreshed.

I replaced this with the PruneCluster implementation mentioned below.

Clustered points of high-risk transactions using Leaflet Marker Cluster pluginExample of 15 high-risk transactions in a single location.

With MarkerCluster, new data would cause the “spider” to contract to a single point due to the whole layer refreshingLeaflet PruneCluster plugin: This was found to be the most performant solution and worked well with streaming data.

It’s also designed for IoT applications in mind, where existing points could frequently change location.

Cosmetically it’s very similar to the Marker Cluster plugin.

The performance stats for PruneCluster are shown below – performed on a modern laptop with the Chrome 38 browser.

Indicative performance of rendering points on map, based on Leaflet PruneCluster pluginChallenges & Lessons learnedEvent cache: Redis Pub/Sub makes it simple to push new events to the client but doesn’t give the option of retrieving recent previous events when a client first connects.

Using a sorted set in Redis or the Time Series Module could allow this but adds additional complexity.

For this PoC, a simple cache is maintained on the server in a Javascript array that allows newly connected clients to load previous events based on a maximum threshold.

Historical analysis: A slider would need to be introduced to control the period of time displayed.

This is reasonably easy to implement with JQuery and Leaflet.

js.

Street view: Is a useful tool for investigating potentially risky areas.

Whilst the Google Maps API has the best integration with this feature, it’s possible to build this into almost any browser based map as shown by this PoC.

ConclusionVisualising data geospatially can unlock valuable insights that would otherwise be missed.

With minimal effort and a generous open source community it’s possible to create powerful visualisations without spending a penny!.

. More details

Leave a Reply