Activity Grouping: The Heart of a Social Network for Athletes

Activities that are group matches are shown, as well as any activity that the main activity has crossed paths with.To compute the similarity of the query activity and a match candidate, we load the full raw (latitude, longitude, time) data of both activities (at Strava, we call this data streams)..We then iterate over time in the main activity, measuring the distance from the main activity to the position of the candidate activity at the same time..The final similarity score is the percentage of time that the candidate activity was within a certain distance of the main activity..To be a group activity, the candidate must have been near your activity more than 30% of the time.By summing over time, we could correctly represent that the social nature of activities often does not involve moving..For example, visiting a coffee shop or bakery might be the entire social part of a ride..If you were recording during a stop, the v2 grouping algorithm would notice, whereas v1 would only take into account efforts you made on segments..This version of grouping also allowed stationary activities and activities in an area without any segments to group correctly.From 2016 onward the worst case for v2 grouping began to cause real problems: In very large group activities such as Ride London, we might see 10,000 activities all intersecting a single spatiotemporal tile, generating as many as 100,000,000 match candidate pairs to process..In these events we typically had to delay computation of group activities, because the latency of grouping would rise by more than 10x and would stress other internal services that grouping depended on, such as our stream storage service.At very large events like the Chicago Marathon (above) we can see upwards of 10,000 uploaded races..This was very taxing on our previous activity grouping system that needed to check correlations between all pairs of these activities.By late 2017, this worst case scenario became an everyday occurrence with the rise of the Zwift virtual cycling platform..Zwift users ride stationary bike trainers anywhere in the world, but generate synthetic GPS data in a single area that has been chosen by Zwift as a fake location of the virtual cycling course..Activities from Zwift are processed by Strava just like any other activity, so location based features will use this synthetic data..These virtual courses can have thousands of concurrent users and are always active, a feat which only the largest real events can match.. More details

Leave a Reply