将文本编码为HTML实体（非标签）

0 投票

1 回答

1157 浏览

提问于 2025-04-17 15:35

我一直在找这个问题的解决办法，但一直没有找到。所以我想，可能是我缺少一些概念，或者不太明白自己真正需要什么。下面是我的问题：

我在用pisa来生成PDF，这就是我用的代码：

def write_to_pdf(template_data, context_dict, filename):
    template = Template(template_data)
    context = Context(context_dict)
    html = template.render(context)
    result = StringIO.StringIO()
    pdf = pisa.pisaDocument(StringIO.StringIO(html.encode("UTF-8")), result, link_callback=fetch_resources)

    if not pdf.err:
        response = http.HttpResponse(mimetype='application/pdf')
        response['Content-Disposition'] = 'attachment; filename=%s.pdf' % filename
        response.write(result.getvalue())
        return response

    return http.HttpResponse('Problem creating PDF: %s' % cgi.escape(html))

如果我想把这个字符串变成PDF：

template_data = 'tésting á'

结果变成了这样（想象#是一个黑点，而不是字母）：

t##sting á

我尝试使用cgi.escape，但没有成功，因为黑点还是在那里，并且最后打印出来的是HTML标签。我的环境是Python 2.7，所以我不能用html.escape来解决所有问题。

所以我需要一种方法，可以把普通文本转换成HTML实体，而不影响已经存在的HTML标签。有没有什么线索？

哦，如果我把那行代码：

pdf = pisa.pisaDocument(StringIO.StringIO(html.encode("UTF-8")), result, link_callback=fetch_resources)

改成

pdf = pisa.pisaDocument(html, result, link_callback=fetch_resources)

就能正常工作，但这样就不会生成HTML实体，而我需要这些，因为我不知道具体会放什么字符，可能pisa不支持。

html实体文本编码编码问题 pdf生成字符处理 HTML标签 pisa库 python2.7

1 个回答

用Python编码命名的HTML实体

http://beckism.com/2009/03/named_entities_python/

还有一个Django应用可以用来解码和编码：

https://github.com/cobrateam/python-htmlentities

对于Python 2.x（在Python 3.x中改为 html.entities.codepoint2name）：

'''
Registers a special handler for named HTML entities

Usage:
import named_entities
text = u'Some string with Unicode characters'
text = text.encode('ascii', 'named_entities')
'''

import codecs
from htmlentitydefs import codepoint2name

def named_entities(text):
    if isinstance(text, (UnicodeEncodeError, UnicodeTranslateError)):
        s = []
        for c in text.object[text.start:text.end]:
            if ord(c) in codepoint2name:
                s.append(u'&%s;' % codepoint2name[ord(c)])
            else:
                s.append(u'&#%s;' % ord(c))
        return ''.join(s), text.end
    else:
        raise TypeError("Can't handle %s" % text.__name__)

codecs.register_error('named_entities', named_entities)

回答于 2025-04-17 由 Python大师

分享举报

将文本编码为HTML实体（非标签）

1 个回答

撰写回答