在utf-8字符串上使用python的urllib.quote_plus和'safe'参数

7 投票

3 回答

15150 浏览

提问于 2025-04-17 22:22

我在Python代码中有一个unicode字符串：

name = u'Mayte_Martín'

我想把它用在SPARQL查询中，这意味着我需要用'utf-8'对这个字符串进行编码，并使用urllib.quote_plus或者requests.quote来处理它。不过，这两个quote函数的表现有点奇怪，尤其是当我使用和不使用'safe'参数时，差别很明显。

from urllib import quote_plus

没有'safe'参数的情况：

quote_plus(name.encode('utf-8'))
Output: 'Mayte_Mart%C3%ADn'

有'safe'参数的情况：

quote_plus(name.encode('utf-8'), safe=':/')
Output: 
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-164-556248391ee1> in <module>()
----> 1 quote_plus(v, safe=':/')

/usr/lib/python2.7/urllib.pyc in quote_plus(s, safe)
   1273         s = quote(s, safe + ' ')
   1274         return s.replace(' ', '+')
-> 1275     return quote(s, safe)
   1276 
   1277 def urlencode(query, doseq=0):

/usr/lib/python2.7/urllib.pyc in quote(s, safe)
   1264         safe = always_safe + safe
   1265         _safe_quoters[cachekey] = (quoter, safe)
-> 1266     if not s.rstrip(safe):
   1267         return s
   1268     return ''.join(map(quoter, s))

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 10: ordinal not in range(128)

问题似乎出在rstrip函数上。我尝试做了一些修改，并调用了...

quote_plus(name.encode('utf-8'), safe=u':/'.encode('utf-8'))

但这并没有解决问题。这里可能是什么原因呢？

utf-8 urllib unicode字符串编码 quote_plus safe参数 SPARQL查询 rstrip函数

3 个回答

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from __future__ import unicode_literals
import urllib
name = u'Mayte_Martín'
print urllib.quote_plus(name.encode('utf-8'), safe=':/')

在我这边运行得很好（Python 2.7.9，Debian系统）

（我不知道答案，但因为声望的原因我不能发表评论）

回答于 2025-04-17 由 Python大师

分享举报

根据这个错误报告，这里有一个解决办法：

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from __future__ import unicode_literals
from urllib import quote_plus
name = u'Mayte_Martín'
quote_plus(name.encode('utf-8'), safe=':/'.encode('utf-8'))

你必须把在quote或quote_plus方法中的两个参数都用utf-8进行encode处理。

回答于 2025-04-17 由 Python大师

分享举报

我在回答我自己的问题，希望能帮助到遇到同样问题的人。

这个问题出现在你在当前工作环境中进行以下导入后，执行其他任何操作之前。

from __future__ import unicode_literals

这个导入不知怎么的和下面这段代码不兼容。

from urllib import quote_plus

name = u'Mayte_Martín'
quote_plus(name.encode('utf-8'), safe=':/')

如果不导入unicode_literals，下面的代码就能正常运行。

回答于 2025-04-17 由 Python大师

分享举报

在utf-8字符串上使用python的urllib.quote_plus和'safe'参数

3 个回答

撰写回答