TypeError:必须是unicode，而不是NLTK中的str - 问答 - Python中文网

TypeError:必须是unicode，而不是NLTK中的str

2024-06-10 04:29:52 发布

您现在位置：Python中文网/ 问答频道 /正文

男 | 程序猿一只，喜欢编程写python代码。

我正在使用python2.7、nltk 3.2.1和python crfsuite 0.8.4。我正在关注这个页面：nltk.tag.crf模块的http://www.nltk.org/api/nltk.tag.html?highlight=stanford#nltk.tag.stanford.NERTagger。

首先我要做的就是

from nltk.tag import CRFTagger
ct = CRFTagger()
train_data = [[('dfd','dfd')]]
ct.train(train_data,"abc")

我也试过了

f = open("abc","wb")
ct.train(train_data,f)

但我有以下错误

  File "C:\Python27\lib\site-packages\nltk\tag\crf.py", line 129, in <genexpr>
    if all (unicodedata.category(x) in punc_cat for x in token):
TypeError: must be unicode, not str

Tags：模块 in http data tag train 页面 abc

1条回答

网友

1楼 · 发布于 2024-06-10 04:29:52

在Python 2中，正则引号'...'或"..."创建字节字符串。要获取Unicode字符串，请在字符串前使用u前缀，如u'dfd'。

要从文件中读取，需要指定编码。有关选项，请参见Backporting Python 3 ^{} to Python 2；最直接的方法是，将open()替换为io.open()。

要转换现有字符串，请使用unicode()方法；尽管通常情况下，您也需要使用decode()并提供编码。

对于（更多）细节，建议使用Ned Batchelder的“实用Unicode”幻灯片，如果不是完全强制阅读的话；http://nedbatchelder.com/text/unipain.html

相关问题更多 >

编程相关推荐

热门问题

热门文章