在Python中转义特殊HTML字符

28 投票

4 回答

55258 浏览

提问于 2025-04-15 18:05

我有一个字符串，其中可能会出现一些特殊字符，比如 '、" 或 & 等等。在这个字符串中：

string = """ Hello "XYZ" this 'is' a test & so on """

我该如何自动处理这些特殊字符，让它们变成这样：

string = " Hello &quot;XYZ&quot; this &#39;is&#39; a test &amp; so on "

字符串处理特殊字符 html转义

4 个回答

一个简单的字符串函数就可以解决这个问题：

def escape(t):
    """HTML-escape the text in `t`."""
    return (t
        .replace("&", "&amp;").replace("<", "&lt;").replace(">", "&gt;")
        .replace("'", "&#39;").replace('"', "&quot;")
        )

这个讨论中的其他回答有一些小问题：cgi.escape方法出于某种原因会忽略单引号，而且你需要特别要求它处理双引号。链接的维基页面处理了所有五种情况，但使用了XML实体'，这并不是HTML实体。

这个代码函数始终处理所有五种情况，使用的是HTML标准实体。

回答于 2025-04-15 由 Python大师

分享举报

cgi.escape 方法可以把一些特殊字符转换成有效的 HTML 标签。

 import cgi
 original_string = 'Hello "XYZ" this \'is\' a test & so on '
 escaped_string = cgi.escape(original_string, True)
 print original_string
 print escaped_string

这样会得到

Hello "XYZ" this 'is' a test & so on 
Hello &quot;XYZ&quot; this 'is' a test &amp; so on

cgi.escape 的第二个可选参数可以用来转义引号。默认情况下，引号是不会被转义的。

回答于 2025-04-15 由 Python大师

分享举报

在Python 3.2中，你可以使用html.escape函数，比如：

>>> string = """ Hello "XYZ" this 'is' a test & so on """
>>> import html
>>> html.escape(string)
' Hello &quot;XYZ&quot; this &#x27;is&#x27; a test &amp; so on '

如果你用的是更早版本的Python，可以查看http://wiki.python.org/moin/EscapingHtml：

Python自带的cgi模块里有一个escape()函数：
import cgi

s = cgi.escape( """& < >""" )   # s = "&amp; &lt; &gt;"
不过，这个函数只能处理&、<和>这几个字符。如果你用cgi.escape(string_to_escape, quote=True)来调用它，它还会处理"这个字符。

这里有一小段代码，可以让你同时处理引号和撇号：
 html_escape_table = {
     "&": "&amp;",
     '"': "&quot;",
     "'": "&apos;",
     ">": "&gt;",
     "<": "&lt;",
     }

 def html_escape(text):
     """Produce entities within text."""
     return "".join(html_escape_table.get(c,c) for c in text)
你还可以使用xml.sax.saxutils里的escape()来处理HTML。这种方法执行起来应该更快。同一个模块里的unescape()函数可以用来解码字符串，参数可以和escape()一样。
from xml.sax.saxutils import escape, unescape
# escape() and unescape() takes care of &, < and >.
html_escape_table = {
    '"': "&quot;",
    "'": "&apos;"
}
html_unescape_table = {v:k for k, v in html_escape_table.items()}

def html_escape(text):
    return escape(text, html_escape_table)

def html_unescape(text):
    return unescape(text, html_unescape_table)

回答于 2025-04-15 由 Python大师

分享举报

在Python中转义特殊HTML字符

4 个回答

撰写回答