NLTK:从字符串到带有“slashtokens”单词/POS的树?

2024-06-01 03:42:01 发布

您现在位置:Python中文网/ 问答频道 /正文

nltk.tree类的tree pretty打印格式如下:

print spacy2tree(nlp(u'Williams is a defensive coach') )
(S
  (SUBJ Williams/NNP)
  (PRED is/VBZ test/VBN)
  a/DT
  defensive/JJ
  coach/NN)

作为树:

 spacy2tree(nlp(u'Williams is a defensive coach') )
 Tree('S', [Tree('SUBJ', [(u'Williams', u'NNP')]), 
    Tree('PRED', [(u'is', u'VBZ'), ('test', 'VBN')]), (u'a', u'DT'), (u'defensive', u'JJ'), (u'coach', u'NN')])

但是没有正确地摄取它:

tfs =  spacy2tree(nlp(u'Williams is a defensive coach') ).pformat()

Tree.fromstring(tfs)
Tree('S', [Tree('SUBJ', ['Williams/NNP']), 
   Tree('PRED', ['is/VBZ', 'test/VBN']), 'a/DT', 'defensive/JJ', 'coach/NN'])

例如:

      correct                                    incorrect
 ('SUBJ', [(u'Williams', u'NNP')])       =vs=>    ('SUBJ', ['Williams/NNP'])
('PRED', [(u'is', u'VBZ'), ('test', 'VBN')])  =vs=> ('PRED', ['is/VBZ', 'test/VBN'])

是否有一个工具可以正确地从字符串中摄取树


Tags: testtreenlpisdtpredwilliamsjj
1条回答
网友
1楼 · 发布于 2024-06-01 03:42:01

看来我明白了:

 : Tree.fromstring(tfs, read_leaf=lambda s : tuple(s.split('/')))
 : Tree('S', [Tree('SUBJ', [(u'Williams', u'NNP')]), 
         Tree('PRED', [(u'is', u'VBZ'), (u'test', u'VBN')]), (u'a', u'DT'), (u'defensive', u'JJ'), (u'coach', u'NN')])

因此,现在这也可以正常工作:

: tree2conlltags(Tree.fromstring(tfs, read_leaf=lambda s : tuple(s.split('/'))))
 : 
 [(u'Williams', u'NNP', u'B-SUBJ'),
  (u'is', u'VBZ', u'B-PRED'),
  (u'test', u'VBN', u'I-PRED'),
  (u'a', u'DT', u'O'),
  (u'defensive', u'JJ', u'O'),
  (u'coach', u'NN', u'O')]

相关问题 更多 >