Graphing Brexit: Clustering Edition

Graphing Brexit: Clustering EditionMark NeedhamBlockedUnblockFollowFollowingMar 31This is the 2nd in a series of posts showing how to analyse Brexit with graphs.

In this post we cluster MPs based on voting records.

Brexit — EUOn Friday I wrote a blog post showing how to do graph analysis of Brexit data using Neo4j, and towards the end of the first post I showed how to compute the similarities of MPs to Boris Johnson, based on the position that they took on different motions.

While doing this analysis I noticed that there were a lot of people who voted in an identical way on every motion, so I wanted to explore that further.

Robin Bramley also pointed me towards the CommonsVotes website, which has the voting records available in CSV format.

This will be helpful for future Brexit analysis, so let’s quickly learn how to import data from there into Neo4j.

Importing CommonsVotes dataI’ve downloaded the data for all the indicative votes and put the files into the data/commonsvotes directory of the graphing-brexit GitHub repository.

We can import the data for all the indicative motions using the Cypher query below:UNWIND [655,656,657,658,659,660,661,662] AS divisionLOAD CSV FROM "https://github.

com/mneedham/graphing-brexit/raw/master/data/commonsvotes/Division" + division + ".

csv" AS row// Create motion nodesWITH division, collect(row) AS rowsMERGE (motion:Motion {division: trim(split(rows[0][0], ":")[1]) })SET motion.

name = rows[2][0]// Skip the first 6 rows as they have metadata we don't needWITH motion, rowsUNWIND rows[7.

] AS row// Create person, party, constituency, and corresponding relsMERGE (person:Person {name: row[0]})MERGE (party:Party {name: row[1]})MERGE (constituency:Constituency {name: row[2]})MERGE (person)-[:MEMBER_OF]->(party)MERGE (person)-[:REPRESENTS]->(constituency)WITH person, motion, CASE WHEN row[3] = "Aye" THEN "FOR" WHEN row[3] = "No" THEN "AGAINST" ELSE "DID_NOT_VOTE" END AS voteCALL apoc.

merge.

relationship(person, vote, {}, {}, motion)YIELD relRETURN count(*)Identical VotersNow that we’ve got the data loaded we’re going to explore those identical voters.

I tried a few different approaches, and eventually settled on the following:Create a similarity graph based on people who voted in an identical wayRun the connected components algorithm over that similarity graph to find clusters of peopleAfter we’ve done these two steps we want to have a node representing each community or cluster that we find, and a relationship from each person to their community.

Let’s get started!Similarity GraphWe’re going to create a similarity graph of MPs, which is a fancy way of saying that we’re going to create relationships between pairs of MPs that voted in an identical way.

Cosine SimilarityAs in the first post, we’re going to use the Cosine Similarity algorithm from the Neo4j Graph Algorithms Library to do this.

Before we create any relationships in our database, we’re going to run the non streaming version of the procedure with write: false and similarityCutoff: 1.

0 (i.

e.

they behaved identically) to see how many pairs of identical voters we have without writing to the database.

The following Cypher query does this:MATCH (p:Person), (c:Motion)OPTIONAL MATCH (p)-[vote]->(c)WITH p, c, CASE WHEN type(vote) = "FOR" THEN 1 WHEN type(vote) = "DID_NOT_VOTE" THEN 0.

5 ELSE 0 END AS scoreWITH {item:id(p), weights: collect(score)} as userDataWITH collect(userData) as dataCALL algo.

similarity.

cosine(data, { similarityCutoff: 1.

0, write: false})YIELD nodes, similarityPairs, writeRelationshipType, writePropertyRETURN nodes, similarityPairs, writeRelationshipType, writePropertyHow many identical votes do we have?There are more than 10,000 similarity pairs for our 649 MPs!.This should be a fun graph to work with.

We’ll now run this procedure with write:true so that SIMILAR relationships are created between nodes that have a similarity of 1.

0.

Below is a sample of the graph that is created:Similarity GraphFinding clusters with Connected ComponentsNow we’re going to run the connected components algorithm over this similarity graph.

But what does this algorithm do?From the documentation page:The Connected Components, or Union Find, algorithm finds sets of connected nodes in an undirected graph where each node is reachable from any other node in the same set.

We can run the streaming version of this algorithm over our similarity graph by executing the following query:// Run the algorithm for:// nodes with 'Person' label, based on the 'SIMILAR' rel typeCALL algo.

unionFind.

stream('Person', 'SIMILAR', {direction: "BOTH"})YIELD nodeId,setId// Grouping by setId, create a 'Community' node for each communityWITH setId, collect(algo.

getNodeById(nodeId)) AS nodesMERGE (community:Community { id: setId, type: "Connected Components"})// Create 'IN_COMMUNITY' rel from each node to its communityWITH community, nodesUNWIND nodes AS nodeMERGE (node)-[:IN_COMMUNITY]->(community);This query returns a stream of setId, nodeId pairs.

We then create a node with the label Community for each setId, and create a relationship from each node to its corresponding community.

We can run the following query to visualise these communities:MATCH (c:Community)WHERE size((c)<-[:IN_COMMUNITY]-()) > 10MATCH path = (c)<-[comm:IN_COMMUNITY]-(person)RETURN pathLIMIT 200Communities of MPs based on how they votedSo…how did the people in each of these communities vote?.We can create a visualisation of how the communities voted by running the following Cypher query:// Find communities with more than 10 peopleMATCH (c:Community)WITH c, size((c)<-[:IN_COMMUNITY]-()) AS sizeWHERE size > 10// Take one person to act as a representative of each communityMATCH (c)<-[rel:IN_COMMUNITY]-(person)WITH c, collect(person)[0] AS person// Check which motions that person voted forWITH c, [(person)-[:FOR]->(motion:Motion) | motion] AS votes// Create a virtual relationship from each community to the motionUNWIND votes AS vote CALL apoc.

create.

vRelationship(c,"FOR",{},vote) yield relRETURN *In this query we pick one person to act as a representative for their community.

We then check how they voted, and finally create a virtual relationship (using the APOC library) from each community to the motions voted for.

This results in the following visualisation:How did each community vote?The yellow nodes represent communities, and the number indicates the number of MPs in that community.

We can see that there’s a large community over to the right containing 106 people who voted for No Deal and to get Contingent preferential arrangements.

We would assume that the people in those clusters are mostly in the Conservative Party.

Over the other side we have smaller clusters voting in favour of the other motions, including Jeremy Corbyn’s alternative plan, the revocation of article 50, and a Confirmatory public vote.

Finding influential members in each clusterWhen I showed this visualisation to Michael, he suggested that we should identify the most influential people in each cluster, and I thought a simple way to do that would be to count the views on their Wikipedia page.

I wrote a script to extract these counts into a CSV file, and then imported those into Neo4j:load csv with headers from "https://github.

com/mneedham/graphing-brexit/raw/master/data/pageviews.

csv" AS rowMATCH (p:Person {name: row.

person})SET p.

pageviews = toInteger(row.

pageviews);Michael also suggested adding party labels to each MP so that we can colour them with their party colours on the visualisation.

The following query does this:match (p:Person)-[:MEMBER_OF]->(pa:Party)CALL apoc.

create.

addLabels(p, [apoc.

text.

replace(pa.

name, " ", "")]) YIELD nodeRETURN count(*)Now that we’ve done this we can update our query to add the top 3 most influential people per cluster into our visualisation:// Find communities with more than 10 peopleMATCH (c:Community)WHERE size((c)<-[:IN_COMMUNITY]-()) > 10// Find the top 3 most influential people per communityMATCH (c)<-[rel:IN_COMMUNITY]-(person)WITH c, rel, size, personORDER BY person.

pageviews DESCWITH c, collect({person: person, rel:rel })[.

3] AS topPeople// Select the first of those people to represent the clusterWITH c, topPeople, topPeople[0].

person AS person// Check how that person votedWITH c, topPeople, [(person)-[:FOR]->(motion:Motion) | motion] AS votes// Create a virtual relationship from each community to the motion UNWIND votes AS vote CALL apoc.

create.

vRelationship(c,"FOR",{},vote) yield relRETURN *Communities, their most famous MPs, and how they votedNow we can clearly see the Conservative clusters over to the right hand side.

The biggest one contains Boris Johnson, Jacob Rees-Mogg, and Priti Patel, while the other one contains some lesser known members of the party.

Over to the left the SNP are in yellow and are in favour of having a public vote and revoke article 50.

The Independent Party have also voted in favour of both of those motions.

The reason those two communities are separate is because we took AGAINST and DID_NOT_VOTE relationships into account when building our similarity graph, and they must have voted differently thereFinally the Labour members are split out into several clusters, although all of them voted for Jeremy Corbyn’s motion.

As with the SNP and Independent clusters, we would end up with a smaller number of clusters if we built a similarity graph based only on FOR relationships.

We’ve done most of the ground work, so let’s see what happens if we only take the motions that people voted for into consideration.

Clusters based on motions voted forWe’ll use a similar approach to build our similarity graph, but this time since we only care about one relationship type, we can use the Jaccard Similarity algorithm to compute similarity between nodes.

We want to create a SIMILAR_FOR relationship between people who voted for motions in an identical way.

The following Cypher query will do this:MATCH (p:Person)-[:FOR]->(motion:Motion)WITH {item:id(p), categories: collect(id(motion))} as userDataWITH collect(userData) as dataCALL algo.

similarity.

jaccard(data, { similarityCutoff: 1.

0, write:true, writeRelationshipType: "SIMILAR_FOR"})YIELD nodes, similarityPairs, writeRelationshipType, writePropertyRETURN nodes, similarityPairs, writeRelationshipType, writeProperty;And now we’ll run the connected components algorithm over this graph and, as before, we’ll connect people to community nodes:// Run the algorithm for:// nodes with 'Person' label, based on the 'SIMILAR_FOR' rel typeCALL algo.

unionFind.

stream('Person', 'SIMILAR_FOR', { direction: "BOTH"})YIELD nodeId,setId// Grouping by setId, create a 'Community' node for each communityWITH setId, collect(algo.

getNodeById(nodeId)) AS nodesMERGE (community:Community { id: setId, type: "Connected Components FOR"})WITH community, nodes// Create 'IN_COMMUNITY' rel from each node to its communityUNWIND nodes AS nodeMERGE (node)-[:IN_COMMUNITY]->(community);Once we’ve done this we can re-run the query that finds clusters, motions, and the most famous people per cluster.

We determine the most famous people based on the number of views their Wikipedia pages received.

MATCH (c:Community {type: "Connected Components FOR"})WITH c, size((c)<-[:IN_COMMUNITY]-()) AS sizeWHERE size > 10MATCH (c)<-[rel:IN_COMMUNITY]-(person)WITH c, rel, size, personORDER BY person.

pageviews DESCWITH c, collect({person: person, rel:rel })[.

3] AS topPeople, sizeWITH c, topPeople, topPeople[0].

person AS person, sizeWITH c, topPeople, size, [(person)-[:FOR]->(motion:Motion) | motion] AS votes UNWIND votes AS vote CALL apoc.

create.

vRelationship(c,"FOR",{},vote) yield relRETURN *;Communities, their most famous MPs, and how they votedWe’ve now got three clusters over on the right.

The conservative clusters remain, but we now also have a cluster containing Democratic Unionist Party members who only voted in favour of the Contingent Preferential Arrangement.

Over on the left we now have a cluster containing members of the Conservative, Independent, and Liberal Democrat parties.

This cluster votes in favour of Revocation to avoid no deal and the Confirmatory public vote.

There are then a bunch of different Labour clusters voting for some subset of the other 6 motions.

After the first post, Chris Eyre asked the following:It certainly seems like there could be some political groupings based on these clusters:there’s a hard Brexit side of the Conservative party which is much bigger than I realised.

some members of the independent party seem to have similar views to some Conservatives and Liberal DemocratsWhat next?I’ve created a list of the other types of analysis that people suggested on twitter after I wrote the first post.

Jo Stichbury suggested that I explore whether MPs are voting in line with what their constituency wants:As part of this post I’ve loaded the constituencies and the percentage of people that voted leave, and am now in a position to try and answer this question.

And if you have a general interest in graph analysis of data, you might enjoy the O’Reilly Graph Algorithms Book that Amy Hodler and I have been working on over the last 9 months.

We’re in the final review stage and it should be available in the next few weeks.

You can register to get your free digital copy from the Neo4j website, at: neo4j.

com/graph-algorithms-book.

O’Reilly Graph Algorithms Book.

. More details

Leave a Reply