如何获取由斯坦福解析器生成的树的叶子节点?
我在Python中使用斯坦福解析器,方法如下:
import os
sentence = "Did Matt win the men slalom?"
os.popen("echo '"+sentence+"' > ~/stanfordtemp.txt")
parser_out = os.popen("~/stanford-parser-2012-11-12/lexparser.sh
~/stanfordtemp.txt").readlines()
for tree in parser_out:
print tree
但是,我不知道怎么才能访问解析器返回的树的叶子节点。你能帮我吗?我还需要写一段代码,能够从英文句子生成SQL查询。对此有什么建议吗?任何帮助都非常感谢。我还在使用nltk来进行所有操作。
2 个回答
如何将句子中的每个分句提取出来,形成子树呢?也就是说,每当一个分句开始(比如S、SBAR、SBARQ等),就把它提取成一个子树,直到遇到下一个分句为止。对于最后一个分句,就提取到句子的结尾。
下面是一个例子:
(ROOT
(S
(S
(NP (NNP John))
(VP (VBZ lives)
(PP (IN in)
(NP (NNP New) (NNP York) (NN city)))))
(, ,)
(CC but)
(S
(SBAR
(WHADVP (WRB whenever))
(S
(NP (PRP he))
(VP (VBZ travels)
(S
(VP (TO to)
(VP (VB work)))))))
(, ,)
(NP (PRP he))
(VP (VBZ travels)
(ADVP (RB very) (RB far))
(PP (TO to)
(NP (PRP$ his) (NN work) (NN place)))))
(. .)))
这里有一个例子,展示了如何构建一棵树,然后递归地生成这棵树的叶子节点列表。这个示例文本来自于斯坦福在线解析器。
# class for tree nodes
class Node:
def __init__(self,start):
self.start = start
self.children = []
self.text = ''
# make a tree
def make_tree(s):
stack = []
nodes = []
cur = None
root = None
for i, c in enumerate(s):
if c == '(':
cur = Node(i)
if stack:
stack[-1].children.append(cur)
stack.append(cur)
if root is None:
root = cur
elif c == ')' and stack:
topnode = stack.pop()
text = s[topnode.start + 1: i]
topnode.text = text
return root
# list of leaves
def list_of_leaves(node):
result = []
for child in node.children:
result.extend(list_of_leaves(child))
if not result:
return [node]
return result
s = """(ROOT
(SQ (VBD Did)
(NP (NNP Matt))
(VP (VB win)
(NP (DT the) (NNS men) (NN slalom)))
(. ?)))"""
root = make_tree(s)
for node in list_of_leaves(root):
print node.text