在Google App Engine中使用Python时的UTF-8和ASCII问题

0 投票
2 回答
4415 浏览
提问于 2025-04-17 00:03

最近几天,我一直在尝试学习在App Engine上使用Python。不过,我遇到了一些关于ASCII和UTF编码的问题。最新的问题是这样的:

我有一段来自《Code in the Cloud》这本书的简单聊天室代码:

from google.appengine.ext import webapp
from google.appengine.ext.webapp.util import run_wsgi_app
import datetime


# START: MainPage
class ChatMessage(object):
def __init__(self, user, msg):
    self.user = user
    self.message = msg
    self.time = datetime.datetime.now()

def __str__(self):
    return "%s (%s): %s" % (self.user, self.time, self.message)

Messages = []

class ChatRoomPage(webapp.RequestHandler):
def get(self):
    self.response.headers["Content-Type"] = "text/html"
    self.response.out.write("""
       <html>
         <head>
           <title>MarkCC's AppEngine Chat Room</title>
         </head>
         <body>
           <h1>Welcome to MarkCC's AppEngine Chat Room</h1>
           <p>(Current time is %s)</p>
       """ % (datetime.datetime.now()))
    # Output the set of chat messages
    global Messages
    for msg in Messages:
        self.response.out.write("<p>%s</p>" % msg)
    self.response.out.write("""
       <form action="" method="post">
       <div><b>Name:</b> 
       <textarea name="name" rows="1" cols="20"></textarea></div>
       <p><b>Message</b></p>
       <div><textarea name="message" rows="5" cols="60"></textarea></div>
       <div><input type="submit" value="Send ChatMessage"></input></div>
       </form>
     </body>
   </html>
   """)
 # END: MainPage    
 # START: PostHandler
def post(self):
    chatter = self.request.get("name")
    msg = self.request.get("message")
    global Messages
    Messages.append(ChatMessage(chatter, msg))
    # Now that we've added the message to the chat, we'll redirect
    # to the root page, which will make the user's browser refresh to
    # show the chat including their new message.
    self.redirect('/')        
# END: PostHandler




# START: Frame
chatapp = webapp.WSGIApplication([('/', ChatRoomPage)])


def main():
run_wsgi_app(chatapp)

if __name__ == "__main__":
main()
# END: Frame

这段代码在英文环境下运行得不错。但是,一旦我添加了一些非标准字符,各种问题就开始出现了。

首先,为了让网页能够正确显示字符,我添加了一个meta标签 - charset=UTF-8"等等。

有趣的是,如果你输入非标准字母,程序能够很好地处理它们,并且显示没有问题。然而,如果我在脚本中直接输入任何非ASCII字母,网页就无法加载。我发现添加utf-8编码的那一行可以解决这个问题。所以我加上了(# -- coding: utf-8 --)。但这还不够。当然,我忘了把文件保存为UTF-8格式。之后,程序开始运行了。

这本来是个好结局,可惜……

它并没有正常工作。

简单来说,这段代码:

# -*- coding: utf-8 -*-
from google.appengine.ext import webapp
from google.appengine.ext.webapp.util import run_wsgi_app
import datetime


# START: MainPage
class ChatMessage(object):
def __init__(self, user, msg):
    self.user = user
    self.message = msg
    self.time = datetime.datetime.now()

def __str__(self):
    return "%s (%s): %s" % (self.user, self.time, self.message)

Messages = []
class ChatRoomPage(webapp.RequestHandler):
def get(self):
    self.response.headers["Content-Type"] = "text/html"
    self.response.out.write("""
       <html>
         <head>
           <title>Witaj w pokoju czatu MarkCC w App Engine</title>
           <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
         </head>
         <body>
           <h1>Witaj w pokoju czatu MarkCC w App Engine</h1>
           <p>(Dokladny czas Twojego logowania to: %s)</p>
       """ % (datetime.datetime.now()))
    # Output the set of chat messages
    global Messages
    for msg in Messages:
        self.response.out.write("<p>%s</p>" % msg)
    self.response.out.write("""
       <form action="" method="post">
       <div><b>Twój Nick:</b> 
       <textarea name="name" rows="1" cols="20"></textarea></div>
       <p><b>Twoja Wiadomość</b></p>
       <div><textarea name="message" rows="5" cols="60"></textarea></div>
       <div><input type="submit" value="Send ChatMessage"></input></div>
       </form>
     </body>
   </html>
   """)
# END: MainPage    
# START: PostHandler
def post(self):
    chatter = self.request.get(u"name")
    msg = self.request.get(u"message")
    global Messages
    Messages.append(ChatMessage(chatter, msg))
    # Now that we've added the message to the chat, we'll redirect
    # to the root page, which will make the user's browser refresh to
    # show the chat including their new message.
    self.redirect('/')        
# END: PostHandler




# START: Frame
chatapp = webapp.WSGIApplication([('/', ChatRoomPage)])


def main():
run_wsgi_app(chatapp)

if __name__ == "__main__":
main()
# END: Frame

在运行时无法处理我在聊天应用中输入的任何内容。它可以加载,但一旦我输入消息(即使只使用标准字符),我就会收到

File "D:\Python25\lib\StringIO.py", line 270, in getvalue
self.buf += ''.join(self.buflist)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 64: ordinal not in       range(128) 

错误信息。换句话说,如果我想在应用中使用任何字符,就不能在界面中放入非英语字符。反过来,如果我不把文件编码为utf-8,我只能在应用中使用非英语字符。要怎么才能让这一切正常工作呢?

2 个回答

1

@Thomas K. 感谢你的指导。多亏了你,我想出了一个可能有点绕的解决方案,所以这个答案的功劳应该归你。以下这行代码:

Messages.append(ChatMessage(chatter, msg))

应该看起来像这样:

Messages.append(ChatMessage(chatter.encode( "utf-8" ), msg.encode( "utf-8" )))

基本上,我需要把所有的utf-8字符串转换成ascii格式。

2

你的字符串里面有Unicode字符,但它们其实不是Unicode字符串,而是字节字符串。你需要在每个字符串前面加上u,比如写成u"foo",这样才能把它们变成Unicode字符串。如果你确保所有字符串都是Unicode字符串,就能消除那个错误。

你还应该在Content-Type头部指定编码,而不是用meta标签,像这样:

self.response.headers['Content-Type'] = 'text/html; charset=UTF-8'

如果你使用模板系统,而不是把HTML代码直接写在Python代码里,你的生活会轻松很多。

撰写回答