Django与Suds:使用QuerySets时的UnicodeEncodeError

3 投票
2 回答
1107 浏览
提问于 2025-04-17 12:48

作为一个Python开发者,我在处理Python的Unicode问题上折腾了好几年。现在我遇到了一个让我抓狂的情况,自己却解决不了。到现在为止,我已经花了一整天的时间在研究这个问题了……

我的环境是一个小型的Django应用,它通过SOAP(使用Suds库)连接到一个远程系统,拉取一些数据并在Django的数据库中查找:

from myapp.models import Customer
client = suds.client.Client(...)
customer = client.service.getCustomerByEmail('foo@bar.com')

type(customer.email): <class 'suds.sax.text.Text'>

customer_exists = Customer.objects.filter(email=customer.email)

现在客户的电子邮件地址中有一个德语的变音字母ü,这导致Django抛出了一个异常,具体情况如下:

Traceback (most recent call last):
  File "run_anatomy_client.py", line 19, in <module>
    print client.main()
  File "/Users/user/Documents/workspace/Wawi/application/myapp/client.py", line 282, in main
    if not Customer.objects.filter(email=customer.email.encode('latin1')):
  File "/Users/user/Documents/workspace/Wawi/application/myapp/client.py", line 76, in sync_customer
    if not customer_exists:
  File "/Users/user/Documents/workspace/Wawi/pyenv/lib/python2.7/site-packages/django/db/models/query.py", line 113, in __nonzero__
    iter(self).next()
  File "/Users/user/Documents/workspace/Wawi/pyenv/lib/python2.7/site-packages/django/db/models/query.py", line 107, in _result_iter
    self._fill_cache()
  File "/Users/user/Documents/workspace/Wawi/pyenv/lib/python2.7/site-packages/django/db/models/query.py", line 772, in _fill_cache
    self._result_cache.append(self._iter.next())
  File "/Users/user/Documents/workspace/Wawi/pyenv/lib/python2.7/site-packages/django/db/models/query.py", line 273, in iterator
    for row in compiler.results_iter():
  File "/Users/user/Documents/workspace/Wawi/pyenv/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 680, in results_iter
    for rows in self.execute_sql(MULTI):
  File "/Users/user/Documents/workspace/Wawi/pyenv/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 735, in execute_sql
    cursor.execute(sql, params)
  File "/Users/user/Documents/workspace/Wawi/pyenv/lib/python2.7/site-packages/django/db/backends/util.py", line 43, in execute
    logger.debug('(%.3f) %s; args=%s' % (duration, sql, params),
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 28: ordinal not in range(128)

我已经尝试过使用encode()和decode(),还更改了源文件的编码和数据库的结构,目前数据库的结构如下:

mysql> show variables like '%character%';
+--------------------------+-----------------------------------------+
| Variable_name            | Value                                   |
+--------------------------+-----------------------------------------+
| character_set_client     | latin1                                  |
| character_set_connection | latin1                                  |
| character_set_database   | utf8                                    |
| character_set_filesystem | binary                                  |
| character_set_results    | latin1                                  |
| character_set_server     | latin1                                  |
| character_set_system     | utf8                                    |
| character_sets_dir       | /opt/local/share/mysql5/mysql/charsets/ |
+--------------------------+-----------------------------------------+
8 rows in set (0.00 sec)

奇怪的是,如果我设置一个断点,然后在Django的命令行中执行同样的代码行,使用encode()时就能正常工作:

(Pdb) Customer.objects.filter(email=customer.email)
*** UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 28:     ordinal not in range(128)
(Pdb) Customer.objects.filter(email=customer.email.encode('utf-8'))
[]

我会很感激任何提示……

2 个回答

0

我花了超过两个小时试图搞清楚发生了什么,以及为什么在把Suds数据结构的值赋给Django对象的字段后,我无法保存这些对象。

正如@guillaumevincent提到的,Suds的Text类是从unicode继承而来的,但它的实现并不是100%正确,所以Django在尝试进行一些操作时会失败,而这些操作在基础的unicode类型上是可以正常工作的。

所以针对问题中的例子,我会这样做:

customer_exists = Customer.objects.filter(email=unicode(customer.email))

而在我的情况下,我也是类似地处理的:

django_obj.field_name = suds_obj.field_name

希望这能为某些人节省一些时间 :)

2

suds.sax.text.Text 是从 unicode 这个东西派生出来的。

class Text(unicode):
    """
    An XML text object used to represent text content.
    @ivar lang: The (optional) language flag.
    @type lang: bool
    @ivar escaped: The (optional) XML special character escaped flag.
    @type escaped: bool
    """

如果你想使用它,可以直接把它编码成 UTF-8 格式。

email = customer.email.encode("utf-8")
customer_exists = Customer.objects.filter(email=email)

撰写回答