如何在spacy中标记新语音？

nlp = spacy.load("en_core_web_md") nlp.vocab['bone morphogenetic protein (BMP)-2'] nlp.tokenizer = Tokenizer(nlp.vocab) text = 'This study describes the distributions of bone morphogenetic protein (BMP)-2 as well as mRNAs for BMP receptor type IB (BMPRIB).' doc = nlp(text) print([(token.text,token.tag_) for token in doc])

[('This', 'DT'), ('study', 'NN'), ('describes', 'VBZ'), ('the', 'DT'), ('distributions', 'NNS'), ('of', 'IN'), ('bone', 'NN'), ('morphogenetic', 'JJ'), ('protein', 'NN'), ('(BMP)-2', 'NNP'), ('as', 'RB'), ('well', 'RB'), ('as', 'IN'), ('mRNAs', 'NNP'), ('for', 'IN'), ('BMP', 'NNP'), ('receptor', 'NN'), ('type', 'NN'), ('IB', 'NNP'), ('(BMPRIB).', 'NN')]

[('This', 'DT'), ('study', 'NN'), ('describes', 'VBZ'), ('the', 'DT'), ('distributions', 'NNS'), ('of', 'IN'), ('bone morphogenetic protein (BMP)-2', 'NN'), ('as', 'RB'), ('well', 'RB'), ('as', 'IN'), ('mRNAs', 'NN'), ('for', 'IN'), ('BMP receptor type IB', 'NNP'), ('(', '('), ('BMPRIB', 'NNP'), (')', ')'), ('.', '.')]

2条回答

网友

1楼 · 编辑于 2024-05-16 12:02:26

我在nlp.tokenizer.tokens_列表中找到了解决方案我把我的句子分解成一系列单词，然后把它标记为欲望

导入空间

nlp=spacy.load（“en_core\u web\u sm”）

nlp.tokenizer=nlp.tokenizer.tokens\u来自\u列表

对于nlp.pipe中的doc（[[“本”、“研究”、“描述”、“分布”、“骨形态发生蛋白（BMP）-2”）， “as”，“well”，“as”，“mRNAs”，“for”，“BMP受体类型IB”，“（'，'BMPRIB'，'），'.]]）：

对于文档中的令牌：

   print(token,'//',token.dep_)

网友

2楼 · 编辑于 2024-05-16 12:02:26

看看^{}是否可以帮助您：

import spacy
nlp = spacy.load("en_core_web_md")
text = 'This study describes the distributions of bone morphogenetic protein (BMP)-2 as well as mRNAs for BMP receptor type IB (BMPRIB).'

doc = nlp(text)

with doc.retokenize() as retokenizer:
    retokenizer.merge(doc[6:11])

print([(token.text,token.tag_) for token in doc])

[('This', 'DT'), ('study', 'NN'), ('describes', 'VBZ'), ('the', 'DT'), ('distributions', 'NNS'), ('of', 'IN'), ('bone morphogenetic protein (BMP)-2', 'NN'), ('as', 'RB'), ('well', 'RB'), ('as', 'IN'), ('mRNAs', 'NNP'), ('for', 'IN'), ('BMP', 'NNP'), ('receptor', 'NN'), ('type', 'NN'), ('IB', 'NNP'), ('(', '-LRB-'), ('BMPRIB', 'NNP'), (')', '-RRB-'), ('.', '.')]

相关问题更多 >

编程相关推荐

热门问题

热门文章