Python：解析查询返回字符串的Unicode编码（MeCab）

#!/usr/bin/python # -*- coding:utf-8 -*- import MeCab tagger = MeCab.Tagger("-Owakati") text = 'MeCabで遊んでみよう！' print text result = tagger.parse(text) print result result = unicode(result, 'utf-8') print result

2条回答

网友

1楼 · 编辑于 2024-05-12 22:03:12

看起来你认为MeCab返回UTF8字符串的假设是错误的。所以，在你转换成unicode时，你必须使用其他一些编码（例如iso2022_jp，编码的精确选择取决于MeCab的内部结构）。在

网友

2楼 · 编辑于 2024-05-12 22:03:12

默认情况下，MeCab不返回UTF8。以下是以下链接的引文（通过谷歌翻译）：

http://mecab.googlecode.com/svn/trunk/mecab/doc/index.html#charset

Unless otherwise specified, euc is used. If you would like to use the utf8 or shift-jis, change the charset with configure options dictionary, please rebuild the dictionary. Now, and shift-jis, dictionary of utf8 is created.

尝试result = tagger.parse(text).decode('euc-jp')。在

相关问题更多 >

编程相关推荐

热门问题

热门文章

Python：解析查询返回字符串的Unicode编码（MeCab）

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >