在Django/Python中优化PDF转换

1 投票

1 回答

3056 浏览

提问于 2025-04-16 02:15

我有一个网页应用，可以把报告导出为PDF格式。当查询结果少于100条时，一切都很正常。但是一旦记录数量超过100条，服务器就会出现502代理错误。报告在HTML格式下输出没问题。导致服务器卡住的过程是把HTML转换成PDF。

我使用的是 xhtml2pdf（也叫pisa 3.0）来生成PDF。这个过程大致是这样的：

def view1(request, **someargs):
    queryset = someModel.objects.get(someargs)
    if request.GET['pdf']:
        return pdfWrapper('template.html',queryset,'filename')
    else:
        return render_to_response('template.html',queryset)

def pdfWrapper(template_src, context_dict, filename):
    ################################################
    #
    # The code comented below is an older version
    # I updated the code according the comment recived
    # The function still works for short HTML documents
    # and produce the 502 for larger onese
    #
    ################################################

    ##import cStringIO as StringIO
    import ho.pisa as pisa
    from django.template.loader import get_template
    from django.template import Context
    from django.http import HttpResponse
    ##from cgi import escape

    template = get_template(template_src)
    context = Context(context_dict)
    html  = template.render(context)

    response = HttpResponse()
    response['Content-Type'] ='application/pdf'
    response['Content-Disposition']='attachment; filename=%s.pdf'%(filename)

    pisa.CreatePDF(
        src=html,
        dest=response,
        show_error_as_pdf=True)

    return response

    ##result = StringIO.StringIO()
    ##pdf = pisa.pisaDocument(
    ##            StringIO.StringIO(html.encode("ISO-8859-1")),
    ##            result)
    ##if not pdf.err:
    ##    response = HttpResponse(
    ##                   result.getvalue(), 
    ##                   mimetype='application/pdf')
    ##    response['Content-Disposition']='attachement; filename=%s.pdf'%(filename)
    ##    return response
    ##return HttpResponse('Hubo un error<pre>%s</pre>' % escape(html))

我考虑过创建一个缓冲区，这样服务器可以释放一些内存，但到目前为止还没有找到合适的解决办法。

有没有人能帮帮我？谢谢！

django pdf转换内存优化 xhtml2pdf 服务器性能 502错误报告导出 html到pdf

1 个回答

我不能确切告诉你是什么导致了你的问题——这可能是因为在使用StringIO时出现了缓冲问题。

不过，如果你认为这段代码会真正地流式传输生成的PDF数据，那你就错了：StringIO.getvalue()这个方法返回的是调用时字符串缓冲区的内容，而不是一个输出流（你可以查看这个链接了解更多：http://docs.python.org/library/stringio.html#StringIO.StringIO.getvalue）。

如果你想要流式输出，可以把HttpResponse实例当作一个类似文件的对象来使用（具体可以参考这个链接：http://docs.djangoproject.com/en/1.2/ref/request-response/#usage）。

其次，我觉得在这里使用StringIO没有必要。根据我找到的Pisa文档（顺便说一下，它把这个功能叫做CreatePDF），源可以是一个字符串或一个unicode对象。

就我个人而言，我会尝试以下步骤：

创建一个unicode字符串的HTML
创建并配置HttpResponse对象
用这个字符串作为输入，响应作为输出，调用PDF生成器

大致上，这可能看起来像这样：

html = template.render(context)

response = HttpResponse()
response['Content-Type'] ='application/pdf'
response['Content-Disposition']='attachment; filename=%s.pdf'%(filename)

pisa.CreatePDF(
    src=html,
    dest=response,
    show_error_as_pdf=True)

#response.flush()
return response

不过，我并没有尝试过这是否真的有效。（到目前为止，我只在Java中做过这种PDF流式传输。）

更新：我刚刚查看了HttpResponse的实现。它通过将写入的字符串块收集到一个列表中来实现文件接口。调用response.flush()是没有意义的，因为它什么也不做。此外，即使在响应被当作文件对象访问后，你仍然可以设置像Content-Type这样的响应参数。

你最初的问题也可能与没有关闭StringIO对象有关。StringIO对象的底层缓冲区在调用close()之前是不会释放的。

回答于 2025-04-16 由 Python大师

分享举报

在Django/Python中优化PDF转换

1 个回答

撰写回答