如何在Python中找到两个单词之间的最短依赖路径？问题的回答

如何在Python中找到两个单词之间的最短依赖路径？

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

这个答案依赖于Stanford CoreNLP来获得句子的依赖树。在使用networkx时，它从HugoMailhot的<a href="https://stackoverflow.com/a/32895132/395857">answer</a>中借用了相当多的代码。 在运行代码之前，需要： <ol> <li><code>sudo pip install pycorenlp</code>（斯坦福CoreNLP的python接口）</li> <li>下载<a href="http://stanfordnlp.github.io/CoreNLP" rel="nofollow noreferrer">Stanford CoreNLP</a></li> <li>启动斯坦福CoreNLP服务器，如下所示： <pre><code>java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 50000 </code></pre></li> </ol> 然后可以运行以下代码来查找两个单词之间的最短依赖路径： <pre><code>import networkx as nx from pycorenlp import StanfordCoreNLP from pprint import pprint nlp = StanfordCoreNLP('http://localhost:{0}'.format(9000)) def get_stanford_annotations(text, port=9000, annotators='tokenize,ssplit,pos,lemma,depparse,parse'): output = nlp.annotate(text, properties={ "timeout": "10000", "ssplit.newlineIsSentenceBreak": "two", 'annotators': annotators, 'outputFormat': 'json' }) return output # The code expects the document to contains exactly one sentence. document = 'Robots in popular culture are there to remind us of the awesomeness of'\ 'unbound human agency.' print('document: {0}'.format(document)) # Parse the text annotations = get_stanford_annotations(document, port=9000, annotators='tokenize,ssplit,pos,lemma,depparse') tokens = annotations['sentences'][0]['tokens'] # Load Stanford CoreNLP's dependency tree into a networkx graph edges = [] dependencies = {} for edge in annotations['sentences'][0]['basic-dependencies']: edges.append((edge['governor'], edge['dependent'])) dependencies[(min(edge['governor'], edge['dependent']), max(edge['governor'], edge['dependent']))] = edge graph = nx.Graph(edges) #pprint(dependencies) #print('edges: {0}'.format(edges)) # Find the shortest path token1 = 'Robots' token2 = 'awesomeness' for token in tokens: if token1 == token['originalText']: token1_index = token['index'] if token2 == token['originalText']: token2_index = token['index'] path = nx.shortest_path(graph, source=token1_index, target=token2_index) print('path: {0}'.format(path)) for token_id in path: token = tokens[token_id-1] token_text = token['originalText'] print('Node {0}\ttoken_text: {1}'.format(token_id,token_text)) </code></pre> 输出为： <pre><code>document: Robots in popular culture are there to remind us of the awesomeness of unbound human agency. path: [1, 5, 8, 12] Node 1 token_text: Robots Node 5 token_text: are Node 8 token_text: remind Node 12 token_text: awesomeness </code></pre> 注意，斯坦福CoreNLP可以在线测试：<a href="http://nlp.stanford.edu:8080/parser/index.jsp" rel="nofollow noreferrer">http://nlp.stanford.edu:8080/parser/index.jsp</a> 这个答案在windows7sp1x64ultimate上用斯坦福CoreNLP 3.6.0、pycorenlp 0.3.0和python3.5x64进行了测试。

如何在Python中找到两个单词之间的最短依赖路径？

1 个回答

相关Python问题