UnicodeDecodeError: 'ascii'编解码器无法解码位置0的字节0xe2: 序号超出范围(128)
我有这段代码:
# -*- coding: utf-8 -*-
forbiddenWords=['for', 'and', 'nor', 'but', 'or', 'yet', 'so', 'not', 'a', 'the', 'an', 'of', 'in', 'to', 'for', 'with', 'on', 'at', 'from', 'by', 'about', 'as']
def IntoSentences(paragraph):
paragraph = paragraph.replace("–", "-")
import nltk.data
sent_detector = nltk.data.load('tokenizers/punkt/english.pickle')
sentenceList = sent_detector.tokenize(paragraph.strip())
return sentenceList
from Tkinter import *
root = Tk()
var = StringVar()
label = Label( root, textvariable=var)
var.set("Fill in the caps: ")
label.pack()
text = Text(root)
text.pack()
button=Button(root, text ="Create text with caps.", command =lambda: IntoSentences(text.get(1.0,END)))
button.pack()
root.mainloop()
当我运行这段代码时,一切都正常。然后我输入文本并按下按钮。但接着我就遇到了这个错误:
C:\Users\Indrek>caps_main.py
Exception in Tkinter callback
Traceback (most recent call last):
File "C:\Python27\lib\lib-tk\Tkinter.py", line 1470, in __call__
return self.func(*args)
File "C:\Python27\Myprojects\caps_main.py", line 25, in <lambda>
button=Button(root, text ="Create text with caps.", command =lambda: IntoSen
tences(text.get(1.0,END)))
File "C:\Python27\Myprojects\caps_main.py", line 7, in IntoSentences
paragraph = paragraph.replace("ŌĆō", "-")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal
not in range(128)
怎么解决这个问题呢?最开始我在运行代码时也遇到了同样的错误信息,然后我加上了lambda:,现在在我点击应用里的按钮时又出现了这个问题。
1 个回答
3
你需要把这个字符串解码成utf-8(或者其他编码格式),然后再把里面的unicode字符串替换成其他内容。下面这段代码就是用来实现你想要的效果的:
paragraph = paragrah.decode('utf-8').replace(u'\u014c\u0106\u014d','-')
# '\u014c\u0106\u014d' is the unicode representation of characters ŌĆō