Connecting and interpolating POIs to a road networkScalable interpolation based on the nearest edgeYuwen ChangBlockedUnblockFollowFollowingApr 20This article discusses the process of handling the integration of point geometries and a geospatial network.

You may find a demonstration Jupyter Notebook and the script of the function here.

Introduction: working with geospatial networkWhen we work with network analysis, our objects of analysis (represented as nodes or vertices) should be connected through a graph, with defined adjacencies and edge weights.

However, we often gather data from different sources and need to integrate them manually.

The process is actually a bit more complicated than I had imagined.

Let us look at this example.

In the scenario of analyzing Points of Interest (POIs) in a city, we may have a pedestrian network retrieved from OpenStreetMap and a set of restaurants queried from Yelp.

One common approach is to snap the POIs to the nearest vertices (nodes, junctions, road bends) on the network.

While this is a very simple an efficient way to implement, some problems with this method include (1) ignoring the edge between the POI and the network, and (2) inconsistent snapping results depending on the availability of nearby vertices.

In other words, although the walking distance of a POI to the network based on the snapping result may be somehow attributed to the vertex as the access impedance (e.

g.

, distance or walk time), we still lose some geospatial information (e.

g.

, the actual location of the POI and the footway to access it).

Moreover, if the only nearby road segment is a long and straight one with few vertices, the snapping distance will not be a realistic one.

While map matching techniques, as usually adopted in GPS trajectory recovery, address a similar problem, it often returns a calibrated trajectory itself instead of modifying the actual network.

In the reference, you may find some of the questions raised by the community, and yet, the solutions I have found are not very comprehensive or ready-to-use as a geoprocessing tool.

For my use case, it will be best to create a Python function that will be handy whenever I want to merge a set of points into the network.

Just for the record, I am using the pandana package developed by UrbanSim to model urban accessibility to amenities.

You can see how this is done in this post.

However, for at least two reasons, I would like to manually craft my own network.

First, I would like to have my outputs calculated based on buildings instead of junctions; therefore, attaching buildings (or whatever I wanted to later on) to the network is a must.

Second, pandana acts exactly the same as what we have discussed previously when handling the POI snapping:The pois are connected to the closest node in the Pandana network which assumes no impedance between the location of the variable and the location of the closest network node.

The Idea: update nodes and edges in orderThe idea is simply locating the nearest projected point of the POI on the network and connect them.

So, to work things out, let us break down the process into several steps:Prepare two tables (geopandas.

GeoDataFrame) representing the network: one for nodes and one for edges, referred to as "internal" nodes and edges below.

In my case, this is downloaded through the osmnx package.

Prepare another table with Point geometries that we want to merge into the network in step 1.

They will be referred to as “external nodes”.

Update the internal nodes table with all the external nodes.

This should be as easy as a table concatenation, as the difficult part comes with the edges.

Find projected access points of all external nodes and update them to the table.

Now, unlike snapping, we need to generate a “projected access point (PAP)” for each external node.

These PAPs are the nearest point from the network to the external point.

In reality, this would be where you “hit the road” after leaving the building and walk straight to the road with the shortest route.

Update internal edges when necessary.

When we generate PAPs, chances are they may fall on an edge instead of on a vertex (which is basically common snapping).

To make sure our graph recognizes this connection correctly, we need to break the original edge into segments.

E.

g.

, from A — B to A — P — B.

In the table, this would mean replacing the original record with multiple new records while inheriting the same road attributes (e.

g.

, road name, number of lanes, etc.

) except for lengths.

Create and update external connections to external nodes.

In reality, this means establishing the shortest walking path from the road to the building.

The Details: R-tree, nearest neighbor, and splitting the roadsAs easy as these may sound, let us discuss the key issues within step 4 to step 6.

Do keep in mind that the most important goal for me is the scalability in terms of processing time.

Singapore, for instance, has 185k buildings 120k pedestrian road segments, and while this process may be a one-time effort, we really do not want it to take hours or days to finish.

The first naive proof-of-concept version of the function I wrote was still running (after a day and a night) by the time I have the optimized version crafted and debugged and tested and visualized and presented and… well, you know what I mean.

Be noted that the osmnx package has trimmed redundant nodes (see section 3 on this page) in the network retrieved from OpenStreetMap, meaning nodes that are just a bend in the road instead of a junction linking more than two edges will be removed.

This is important because it makes graph calculation and our processing more efficient while saving space.

However, with commonly adopted snapping method, we might want to keep as many of these nodes as possible as they may make the snapping result better as compared to sparse nodes.

In the extreme case where there are numerous nodes, the result will be the same as our proposed approach.

The osmnx package removes redundant nodes in the network downloaded from OpenStreetMap.

(Figures retrieved from osmnx introduction article.

)Step 4: This step can be further broken down into two parts: find the nearest edge and get the projected point.

a.

Find the nearest edge:In order to make a projection and the interpolate a PAP, we need to determine which edge we are actually making the projection.

While shapely.

project and shapely.

intersection can be applied to MultiLineString, it does not return the target LineString segment and the results are not stable with some unreasonable projections present.

Therefore, we need to locate the nearest edge by ourselves.

A better way to do this is to have our edges stored in a proper data structure.

Since we are storing LineString instead of Point, I choose to go with R-tree and not k-d tree (see this post if you need a quick refresher).

This stores each edge as a rectangle bounding box, making it very easy to query efficiently.

While one may use rtree.

nearest, this may yield undesirable results.

My understanding is that the distance being measured is actually that between the Point and the "rectangle", not the LineString.

A more comprehensive way (and scalable later on for variations of nearest edge) is to query k-nearest neighbor (kNN) based on R-tree and then calculate the actual distances to each LineString inside the box.

b.

Get the projected point:With the nearest edge, we can easily get PAP with line.

interpolate(line.

project(point)).

Step 5: This step is also broken down as the following:a.

Determine edges and nodes to update:Since there can be more than one PAP on each edge, we want to process them all together instead of repeating the process.

We group all PAPs by the lines they are projected onto (i.

e.

, the nearest edge).

b.

Split the edges with careful processingNext, we may easily get all the line segments with the shapely.

ops.

split function (split A—B with P1, P2 returns A—P1, P1—P2, P2—B).

But, whenever one thinks that is the easiest part, that is often where bugs may bug you.

So I ended up coming across a precision issue as discussed here.

Simply put, when I project and interpolate a Point on a LineString, the returned PAP does NOT intersect with the LineString, making it unable to be used to split the line.

In order to solve this, we need to snap the LineString to the PAP.

This results in a very slight change that is not recognizable to us but ensures the intersection is valid and, therefore, enabling the split.

Step 6: This step is simply generating new nodes by specifying the “from” and “to” nodes and update them to the edges table.

All updates including this one are just mere table manipulation and will not be discussed here.

Note that we used GeoDataFrame to store shapefiles and converted the CRS into epsg=3857 to calculate the road length in meters.

DiscussionAs you may have already known, one biggest difference between a geospatial network and an abstract graph structure (e.

g.

, social network) is that the edges in the former are also concrete objects and have shapes.

This is why we have to navigate our way through these edges and manipulate them.

In a social network, it will not make sense to implement this node-to-edge interpolation.

I believe there is definitely room for improvement for this function.

For example, using cython to optimize the code may achieve better performance.

One may also try to carry out these geoprocessing steps with PostGIS, which I haven't tried yet since I prefer having it done in Python currently.

However, given the current runtime of 7 minutes for the Singapore data (the aforementioned buildings to the pedestrian network scenario), it should be acceptable at the moment.

For hundreds of POIs, this process should be done within seconds (if not a second).

In the real world, there can be various footways connecting a building (red star) to the road network.

Our method only finds the nearest one (cyan node), which is unrealistic in some cases.

Finding a way to generate a set of reasonable routes merely based on the network (w/o computer vision) can be challenging.

I am also considering a multi-directional kNN connection extension.

To illustrate the use case, imagine that the ideal bus stop is at your back door, but the nearest footway yielded by this method connects your house to the road at the front door.

This gives you an unrealistic walk time and route calculation when you are to take a bus at your back door.

Simply establish kNN connections may not fully solve this problem, as they may all be segments at the front door.

Thus, fetching reasonable kNN edges from different directions of the Point will be desirable.

Ideally, we want to connect to one nearest neighbor (road segment) in different directions.

The output on the right shows an example of how simply locating the 5 nearest road segments is not sufficient.

If you have any ideas for improvement or perhaps know an even easier way to complete this task, do leave a comment to share with us!ReferencesThese are the posts where I start off with this problem:How to make a routable graph from a Point and a Line shapefile, using Networkx?Nearest neighbor between point layer and line layer?There are many other posts that I have gone through, below are some of them:Create closest point on multilinestring to use in shortest_path()Understanding use of spatial indexes with RTreeSplit line by nearest points using geopandasFind closest line to each point on big dataset, possibly using shapely and rtree.