我使用spacy是为了从它的依赖项解析中获益,我在使spcay标记器标记我添加的新词汇时遇到了麻烦。 这是我的代码:
nlp = spacy.load("en_core_web_md")
nlp.vocab['bone morphogenetic protein (BMP)-2']
nlp.tokenizer = Tokenizer(nlp.vocab)
text = 'This study describes the distributions of bone morphogenetic protein (BMP)-2 as well as mRNAs for BMP receptor type IB (BMPRIB).'
doc = nlp(text)
print([(token.text,token.tag_) for token in doc])
输出:
[('This', 'DT'), ('study', 'NN'), ('describes', 'VBZ'), ('the', 'DT'), ('distributions', 'NNS'), ('of', 'IN'), ('bone', 'NN'), ('morphogenetic', 'JJ'), ('protein', 'NN'), ('(BMP)-2', 'NNP'), ('as', 'RB'), ('well', 'RB'), ('as', 'IN'), ('mRNAs', 'NNP'), ('for', 'IN'), ('BMP', 'NNP'), ('receptor', 'NN'), ('type', 'NN'), ('IB', 'NNP'), ('(BMPRIB).', 'NN')]
期望输出:
[('This', 'DT'), ('study', 'NN'), ('describes', 'VBZ'), ('the', 'DT'), ('distributions', 'NNS'), ('of', 'IN'), ('bone morphogenetic protein (BMP)-2', 'NN'), ('as', 'RB'), ('well', 'RB'), ('as', 'IN'), ('mRNAs', 'NN'), ('for', 'IN'), ('BMP receptor type IB', 'NNP'), ('(', '('), ('BMPRIB', 'NNP'), (')', ')'), ('.', '.')]
如何使spacy标记化我添加的新语音
我在nlp.tokenizer.tokens_列表中找到了解决方案 我把我的句子分解成一系列单词,然后把它标记为欲望
导入空间
nlp=spacy.load(“en_core\u web\u sm”)
nlp.tokenizer=nlp.tokenizer.tokens\u来自\u列表
对于nlp.pipe中的doc([[“本”、“研究”、“描述”、“分布”、“骨形态发生蛋白(BMP)-2”), “as”,“well”,“as”,“mRNAs”,“for”,“BMP受体类型IB”,“(','BMPRIB','),'.]]):
对于文档中的令牌:
看看^{} 是否可以帮助您:
相关问题 更多 >
编程相关推荐