法国特克斯问题的回答

法国特克斯

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

<a href="http://osdir.com/ml/python.nltk.devel/2007-06/msg00018.html" rel="noreferrer">Here</a>是nltk开发人员的一个旧但相关的注释。看起来nltk中的大多数高级词干分析器都是特定于英语的： <blockquote> The nltk.stem module currently contains 3 stemmers: the Porter stemmer, the Lancaster stemmer, and a Regular-Expression based stemmer. The Porter stemmer and Lancaster stemmer are both English- specific. The regular-expression based stemmer can be customized to use any regular expression you wish. So you should be able to write a simple stemmer for non-English languages using the regexp stemmer. For example, for french: <pre><code>from nltk import stem stemmer = stem.Regexp('s$|es$|era$|erez$|ions$| <etc> ') </code></pre> But you'd need to come up with the language-specific regular expression yourself. For a more advanced stemmer, it would probably be necessary to add a new module. (This might be a good student project.) For more information on the regexp stemmer: <a href="http://nltk.org/doc/api/nltk.stem.regexp.Regexp-class.html" rel="noreferrer">http://nltk.org/doc/api/nltk.stem.regexp.Regexp-class.html</a> -Edward </blockquote> 注意：他提供的链接已失效，请参阅<a href="http://www.nltk.org/api/nltk.stem.html#module-nltk.stem.regexp" rel="noreferrer">here</a>以获取当前regexstemmer文档。 不过，最近添加的<a href="http://www.nltk.org/api/nltk.stem.html#module-nltk.stem.snowball" rel="noreferrer">snowball stemmer</a>似乎能够阻止法语。让我们来检验一下： <pre><code>>>> from nltk.stem.snowball import FrenchStemmer >>> stemmer = FrenchStemmer() >>> stemmer.stem('voudrais') u'voudr' >>> stemmer.stem('animaux') u'animal' >>> stemmer.stem('yeux') u'yeux' >>> stemmer.stem('dors') u'dor' >>> stemmer.stem('couvre') u'couvr' </code></pre> 如你所见，有些结果有点可疑。 不完全是你所希望的，但我想这是一个开始。

法国特克斯

1 个回答

相关Python问题