确定文本是否为英语？

网友

1楼 · 编辑于 2024-06-06 22:58:19

你可能对我的论文感兴趣。我还对一些工具进行了基准测试。

TL；博士：

CLD-2相当不错，速度非常快
lang-detect稍好一点，但慢得多
langid很好，但是CLD-2和lang检测更好
NLTK的Textcat既没有效率也没有效率。

您可以安装^{}并对语言进行分类：

$ lidtk cld2 predict --text "this is some text written in English"
eng
$ lidtk cld2 predict --text "this is some more text written in English"
eng
$ lidtk cld2 predict --text "Ce n'est pas en anglais"                  
fra

网友

2楼 · 编辑于 2024-06-06 22:58:19

使用魔法库

import enchant

dictionary = enchant.Dict("en_US") #also available are en_GB, fr_FR, etc

dictionary.check("Hello") # prints True
dictionary.check("Helo") #prints False

这个例子直接取自他们的website

网友

3楼 · 编辑于 2024-06-06 22:58:19

有一个叫做langdetect的库。它是从谷歌的语言检测移植而来的，可在这里获得：

https://pypi.python.org/pypi/langdetect

它支持55种现成的语言。

相关问题更多 >

编程相关推荐

热门问题

热门文章

确定文本是否为英语？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >