在Google App Engine中使用Python时的UTF-8和ASCII问题
最近几天,我一直在尝试学习在App Engine上使用Python。不过,我遇到了一些关于ASCII和UTF编码的问题。最新的问题是这样的:
我有一段来自《Code in the Cloud》这本书的简单聊天室代码:
from google.appengine.ext import webapp
from google.appengine.ext.webapp.util import run_wsgi_app
import datetime
# START: MainPage
class ChatMessage(object):
def __init__(self, user, msg):
self.user = user
self.message = msg
self.time = datetime.datetime.now()
def __str__(self):
return "%s (%s): %s" % (self.user, self.time, self.message)
Messages = []
class ChatRoomPage(webapp.RequestHandler):
def get(self):
self.response.headers["Content-Type"] = "text/html"
self.response.out.write("""
<html>
<head>
<title>MarkCC's AppEngine Chat Room</title>
</head>
<body>
<h1>Welcome to MarkCC's AppEngine Chat Room</h1>
<p>(Current time is %s)</p>
""" % (datetime.datetime.now()))
# Output the set of chat messages
global Messages
for msg in Messages:
self.response.out.write("<p>%s</p>" % msg)
self.response.out.write("""
<form action="" method="post">
<div><b>Name:</b>
<textarea name="name" rows="1" cols="20"></textarea></div>
<p><b>Message</b></p>
<div><textarea name="message" rows="5" cols="60"></textarea></div>
<div><input type="submit" value="Send ChatMessage"></input></div>
</form>
</body>
</html>
""")
# END: MainPage
# START: PostHandler
def post(self):
chatter = self.request.get("name")
msg = self.request.get("message")
global Messages
Messages.append(ChatMessage(chatter, msg))
# Now that we've added the message to the chat, we'll redirect
# to the root page, which will make the user's browser refresh to
# show the chat including their new message.
self.redirect('/')
# END: PostHandler
# START: Frame
chatapp = webapp.WSGIApplication([('/', ChatRoomPage)])
def main():
run_wsgi_app(chatapp)
if __name__ == "__main__":
main()
# END: Frame
这段代码在英文环境下运行得不错。但是,一旦我添加了一些非标准字符,各种问题就开始出现了。
首先,为了让网页能够正确显示字符,我添加了一个meta标签 - charset=UTF-8"等等。
有趣的是,如果你输入非标准字母,程序能够很好地处理它们,并且显示没有问题。然而,如果我在脚本中直接输入任何非ASCII字母,网页就无法加载。我发现添加utf-8编码的那一行可以解决这个问题。所以我加上了(# -- coding: utf-8 --)。但这还不够。当然,我忘了把文件保存为UTF-8格式。之后,程序开始运行了。
这本来是个好结局,可惜……
它并没有正常工作。
简单来说,这段代码:
# -*- coding: utf-8 -*-
from google.appengine.ext import webapp
from google.appengine.ext.webapp.util import run_wsgi_app
import datetime
# START: MainPage
class ChatMessage(object):
def __init__(self, user, msg):
self.user = user
self.message = msg
self.time = datetime.datetime.now()
def __str__(self):
return "%s (%s): %s" % (self.user, self.time, self.message)
Messages = []
class ChatRoomPage(webapp.RequestHandler):
def get(self):
self.response.headers["Content-Type"] = "text/html"
self.response.out.write("""
<html>
<head>
<title>Witaj w pokoju czatu MarkCC w App Engine</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<h1>Witaj w pokoju czatu MarkCC w App Engine</h1>
<p>(Dokladny czas Twojego logowania to: %s)</p>
""" % (datetime.datetime.now()))
# Output the set of chat messages
global Messages
for msg in Messages:
self.response.out.write("<p>%s</p>" % msg)
self.response.out.write("""
<form action="" method="post">
<div><b>Twój Nick:</b>
<textarea name="name" rows="1" cols="20"></textarea></div>
<p><b>Twoja Wiadomość</b></p>
<div><textarea name="message" rows="5" cols="60"></textarea></div>
<div><input type="submit" value="Send ChatMessage"></input></div>
</form>
</body>
</html>
""")
# END: MainPage
# START: PostHandler
def post(self):
chatter = self.request.get(u"name")
msg = self.request.get(u"message")
global Messages
Messages.append(ChatMessage(chatter, msg))
# Now that we've added the message to the chat, we'll redirect
# to the root page, which will make the user's browser refresh to
# show the chat including their new message.
self.redirect('/')
# END: PostHandler
# START: Frame
chatapp = webapp.WSGIApplication([('/', ChatRoomPage)])
def main():
run_wsgi_app(chatapp)
if __name__ == "__main__":
main()
# END: Frame
在运行时无法处理我在聊天应用中输入的任何内容。它可以加载,但一旦我输入消息(即使只使用标准字符),我就会收到
File "D:\Python25\lib\StringIO.py", line 270, in getvalue
self.buf += ''.join(self.buflist)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 64: ordinal not in range(128)
错误信息。换句话说,如果我想在应用中使用任何字符,就不能在界面中放入非英语字符。反过来,如果我不把文件编码为utf-8,我只能在应用中使用非英语字符。要怎么才能让这一切正常工作呢?
2 个回答
@Thomas K. 感谢你的指导。多亏了你,我想出了一个可能有点绕的解决方案,所以这个答案的功劳应该归你。以下这行代码:
Messages.append(ChatMessage(chatter, msg))
应该看起来像这样:
Messages.append(ChatMessage(chatter.encode( "utf-8" ), msg.encode( "utf-8" )))
基本上,我需要把所有的utf-8字符串转换成ascii格式。
你的字符串里面有Unicode字符,但它们其实不是Unicode字符串,而是字节字符串。你需要在每个字符串前面加上u
,比如写成u"foo"
,这样才能把它们变成Unicode字符串。如果你确保所有字符串都是Unicode字符串,就能消除那个错误。
你还应该在Content-Type
头部指定编码,而不是用meta标签,像这样:
self.response.headers['Content-Type'] = 'text/html; charset=UTF-8'
如果你使用模板系统,而不是把HTML代码直接写在Python代码里,你的生活会轻松很多。