Stanford POS tagger,带nltk,阿拉伯语文本

2024-05-16 22:40:04 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图在阿拉伯语文本中使用NLTK 3.2.4中的Stanford POS Tagger,我找到了一个代码源,但大部分都不懂,因为我对Stanford POS Tagger完全陌生。在

代码源:

import os
java_path = "C:\\Program Files (x86)\\Java\\jdk1.8.0_112\\bin\\java.exe"
os.environ['JAVAHOME'] = java_path

from nltk.tag.stanford import StanfordPOSTagger as POS_Tag
home = 'E:\\Asmaa\\TP python\\'
_path_to_model = home + 'stanford-arabic-corenlp-2017-06-09-models.jar'
_path_to_jar = home + 'stanford-postagger.jar'
POS_Tag.java_options='-mx4096m'
st = POS_Tag(model_filename=_path_to_model, path_to_jar=_path_to_jar)
sentence = '.شرب القط الحليب اللذيذ'
st.tag(sentence.split())

我得到的错误是:

^{pr2}$

怎么了?在


Tags: topath代码posimporthomemodelos
2条回答

我在this web page中找到了解决方案,它是在阿拉伯语模型中。在

正确的代码:

import os
java_path = "C:\\Program Files (x86)\\Java\\jdk1.8.0_112\\bin\\java.exe"
os.environ['JAVAHOME'] = java_path

from nltk.tag.stanford import StanfordPOSTagger as POS_Tag
arabic_postagger = POS_Tag('models/arabic.tagger', 'stanford-postagger.jar')
sentence = '.شرب القط الحليب اللذيذ'
print(arabic_postagger.tag(sentence.split()))

结果:

[('', 'شرب/VBD'), ('', 'القط/DTNN'), ('', 'الحليب/DTNN'), ('', 'اللذيذ/DTJJ')]

首先升级您的nltk,然后在终端或命令提示符下:

wget http://nlp.stanford.edu/software/stanford-corenlp-full-2016-10-31.zip
unzip stanford-corenlp-full-2016-10-31.zip && cd stanford-corenlp-full-2016-10-31

wget http://nlp.stanford.edu/software/stanford-arabic-corenlp-2016-10-31-models.jar
wget https://raw.githubusercontent.com/stanfordnlp/CoreNLP/master/src/edu/stanford/nlp/pipeline/StanfordCoreNLP-arabic.properties

java -Xmx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer \
-serverProperties StanfordCoreNLP-arabic.properties \
-preload tokenize,ssplit,pos,parse \
-status_port 9005  -port 9005 -timeout 15000

在Python中:

^{pr2}$

相关问题 更多 >