How to Find Shortest Dependency Path with spaCy and StanfordNLP

How to Find Shortest Dependency Path with spaCy and StanfordNLPShortest dependency path is a commonly used method in relation extractionBrambleXuBlockedUnblockFollowFollowingJun 26Photo by Caleb Jones on UnsplashTL;DRConsidering the documentation and dependency parsing accuracy, I recommend using spaCy than StanfordNLP.

The content is structured as follows.

What is Shortest Dependency Path (SDP)?Find Shortest Dependency Path with spaCyFind Shortest Dependency Path with StanfordNLPWhat is Shortest Dependency Path (SDP)?Semantic dependency parsing had been frequently used to dissect sentence and to capture word semantic information close in context but far in sentence distance.

To extract the relationship between two entities, the most direct approach is to use SDP.

The motivation of using SDP is based on the observation that the SDP between entities usually contains the necessary information to identify their relationship.

[1]A dependency tree exampleConvulsions that occur after DTaP are caused by a fever.

In the above figure, words in square brackets are marked entities.

The red dashed-line arrows indicate the SDP between two entities.

Find Shortest Dependency Path with spaCyFirst, install the necessary libraries in the terminal.

I add the version number for clearness.

pip install spacy==2.

1.

4python -m spacy download en_core_web_smpip install stanfordnlp==0.

2.

0pip install networkx==2.

3First, we print out all dependency labels follow the official tutorial.

import spacyimport networkx as nxnlp = spacy.

load("en_core_web_sm")doc = nlp(u'Convulsions that occur after DTaP are caused by a fever.

')for token in doc: print((token.

head.

text, token.

text, token.

dep_))# output: (head, current_token, dep_relation)('caused', 'Convulsions', 'nsubjpass')('occur', 'that', 'nsubj')('Convulsions', 'occur', 'relcl')('occur', 'after', 'prep')('caused', 'DTaP', 'nsubjpass')('caused', 'are', 'auxpass')('caused', 'caused', 'ROOT')('caused', 'by', 'agent')('fever', 'a', 'det')('by', 'fever', 'pobj')('caused', '.

', 'punct')We can plot the whole dependency tree by the convenient spaCy visualization tool.

The code below can give the SDPimport spacyimport networkx as nxnlp = spacy.

load("en_core_web_sm")doc = nlp(u'Convulsions that occur after DTaP are caused by a fever.

')print('sentence:'.

format(doc))# Load spacy's dependency tree into a networkx graphedges = []for token in doc: for child in token.

children: edges.

append(('{0}'.

format(token.

lower_), '{0}'.

format(child.

lower_)))graph = nx.

Graph(edges)# Get the length and pathentity1 = 'Convulsions'.

lower()entity2 = 'fever'print(nx.

shortest_path_length(graph, source=entity1, target=entity2))print(nx.

shortest_path(graph, source=entity1, target=entity2))The edges looks like below.

In [6]: edgesOut[6]:[('convulsions', 'occur'), ('occur', 'that'), ('occur', 'after'), ('caused', 'convulsions'), ('caused', 'dtap'), ('caused', 'are'), ('caused', 'by'), ('caused', '.

'), ('by', 'fever'), ('fever', 'a')]The output is below3['convulsions', 'caused', 'by', 'fever']This means the SDP length from ‘convulsions’ to ‘fever’ is 3.

Find Shortest Dependency Path with StanfordNLPFirst, we print out all dependency labels follow the official tutorial.

import stanfordnlpstanfordnlp.

download('en')nlp = stanfordnlp.

Pipeline()doc = nlp('Convulsions that occur after DTaP are caused by a fever.

')doc.

sentences[0].

print_dependencies()# output: (current_token, head_index, dep_relation)('Convulsions', '0', 'root')('that', '3', 'nsubj')('occur', '1', 'acl:relcl')('after', '7', 'mark')('DTaP', '7', 'nsubj:pass')('are', '7', 'aux:pass')('caused', '3', 'advcl')('by', '10', 'case')('a', '10', 'det')('fever', '7', 'obl')In order to follow the [(token, children), (token, children),.

]format for networkx graph, we need to modify the code according to the source code.

import stanfordnlpstanfordnlp.

download('en')nlp = stanfordnlp.

Pipeline()doc = nlp('Convulsions that occur after DTaP are caused by a fever.

')# Load stanfordnlp's dependency tree into a networkx graphedges = []for token in doc.

sentences[0].

dependencies: if token[0].

text.

lower() != 'root': edges.

append((token[0].

text.

lower(), token[2].

text))graph = nx.

Graph(edges)# Get the length and pathentity1 = 'Convulsions'.

lower()entity2 = 'fever'print(nx.

shortest_path_length(graph, source=entity1, target=entity2))print(nx.

shortest_path(graph, source=entity1, target=entity2))The edges looks like below.

In [19]: edgesOut[19]:[('occur', 'that'), ('convulsions', 'occur'), ('caused', 'after'), ('caused', 'DTaP'), ('caused', 'are'), ('occur', 'caused'), ('fever', 'by'), ('fever', 'a'), ('caused', 'fever'), ('convulsions', '.

')]The output is below3['convulsions', 'occur', 'caused', 'fever']Even the SDP length calculated by StanfordNLP is the same with spaCy.

But the words in the SDP between two entity should be 'caused', 'by'.

So the dependency parsing accuracy of spaCy is better than StanfordNLP.

ReferenceHow to find the shortest dependency path between two words in Python?Your problem can easily be conceived as a graph problem where we have to find the shortest path between two nodes.

To…stackoverflow.

comA Shortest Dependency Path Based Convolutional Neural Network for Protein-Protein Relation…Biomedical relations play an important role in biologic processes and are widely researched in the field of biomedical…www.

ncbi.

nlm.

nih.

gov.

. More details

Leave a Reply