如何在Python中取消转义引号等字符？

8 投票

3 回答

8947 浏览

提问于 2025-04-15 11:22

我有一个字符串，里面有一些符号，像这样：

&#39;

看起来这是一个撇号。

我试过用saxutils.unescape()，但是没成功，还试过urllib.unquote()。

我该怎么解码这个呢？谢谢！

字符串处理文本解析字符转义解码技术

3 个回答

看起来最靠谱的解决方案是这个函数，是由Python大牛Fredrik Lundh写的。虽然这个方法不是最简短的，但它可以处理命名实体，以及十六进制和十进制的代码。

回答于 2025-04-15 由 Python大师

分享举报

试试这个：（在这里找到的）

from htmlentitydefs import name2codepoint as n2cp
import re

def decode_htmlentities(string):
    """
    Decode HTML entities–hex, decimal, or named–in a string
    @see http://snippets.dzone.com/posts/show/4569

    >>> u = u'E tu vivrai nel terrore - L&#x27;aldil&#xE0; (1981)'
    >>> print decode_htmlentities(u).encode('UTF-8')
    E tu vivrai nel terrore - L'aldilà (1981)
    >>> print decode_htmlentities("l&#39;eau")
    l'eau
    >>> print decode_htmlentities("foo &lt; bar")                
    foo < bar
    """
    def substitute_entity(match):
        ent = match.group(3)
        if match.group(1) == "#":
            # decoding by number
            if match.group(2) == '':
                # number is in decimal
                return unichr(int(ent))
            elif match.group(2) == 'x':
                # number is in hex
                return unichr(int('0x'+ent, 16))
        else:
            # they were using a name
            cp = n2cp.get(ent)
            if cp: return unichr(cp)
            else: return match.group()

    entity_re = re.compile(r'&(#?)(x?)(\w+);')
    return entity_re.subn(substitute_entity, string)[0]

回答于 2025-04-15 由 Python大师

分享举报

看看这个问题。你要找的东西叫“HTML实体解码”。通常，你会找到一个叫“htmldecode”的函数，它可以满足你的需求。Django、Cheetah和BeautifulSoup这些工具都有这样的函数。

如果你不想使用库，并且所有的实体都是数字，另一个答案也能很好地解决问题。

回答于 2025-04-15 由 Python大师

分享举报

如何在Python中取消转义引号等字符？

3 个回答

撰写回答