App Engine中的BlobStore与Unicode字符
有没有办法在App Engine的BlobStore中存储unicode数据(用Python)?
我这样保存数据:
file_name = files.blobstore.create(mime_type='application/octet-stream')
with files.open(file_name, 'a') as f:
f.write('<as><a>' + '</a><a>'.join(stringInUnicode) + '</a></as>')
但是在生产服务器(不是开发服务器)上,我遇到了这个错误。看起来它把我的Unicode转换成了ASCII,我不知道为什么。
为什么它会试图转换回ASCII?我能避免这个问题吗?
Traceback (most recent call last):
File "/base/data/home/apps/myapp/1.349473606437967000/myfile.py", line 137, in get
f.write('<as><a>' + '</a><a>'.join(stringInUnicode) + '</a></as>')
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/files/file.py", line 364, in write
self._make_rpc_call_with_retry('Append', request, response)
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/files/file.py", line 472, in _make_rpc_call_with_retry
_make_call(method, request, response)
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/files/file.py", line 226, in _make_call
rpc.make_call(method, request, response)
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 509, in make_call
self.__rpc.MakeCall(self.__service, method, request, response)
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/apiproxy_rpc.py", line 115, in MakeCall
self._MakeCallImpl()
File "/base/python_runtime/python_lib/versions/1/google/appengine/runtime/apiproxy.py", line 161, in _MakeCallImpl
self.request.Output(e)
File "/base/python_runtime/python_lib/versions/1/google/net/proto/ProtocolBuffer.py", line 204, in Output
self.OutputUnchecked(e)
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/files/file_service_pb.py", line 2390, in OutputUnchecked
out.putPrefixedString(self.data_)
File "/base/python_runtime/python_lib/versions/1/google/net/proto/ProtocolBuffer.py", line 432, in putPrefixedString
v = str(v)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 313: ordinal not in range(128)
1 个回答
5
BLOB存储是用来存放二进制数据的,也就是字节,而不是字符。所以你需要进行某种编码处理。utf-8
编码是一个不错的选择。
f.write('<as><a>' + '</a><a>'.join(stringInUnicode) + '</a></as>')
如果stringInUnicode
中的某个项目包含了<
、&
或]]>
这样的序列,就会出问题。你需要进行一些转义处理(可以使用合适的XML库来序列化数据,或者手动处理):
with files.open(file_name, 'a') as f:
f.write('<as>')
for line in stringInUnicode:
line= line.replace(u'&', u'&').replace(u'<', u'<').replace(u'>', u'>');
f.write('<a>%s</a>' % line.encode('utf-8'))
f.write('</as>')
(如果字符串中包含控制字符,这样的处理仍然会导致XML格式不正确,但对此你也无能为力。如果你需要在XML中存储任意的二进制数据,就需要使用一些特殊的编码方式,比如在其上使用base-64编码。)