如何在不占用磁盘的情况下开始下载并渲染响应？

Question

我在Django中有一个科学数据的Excel文件验证表单，运行得很好。这个表单是循环进行的，用户可以在他们的研究中不断上传新数据的文件。每次上传时，DataValidationView会检查这些文件，并给用户提供一个错误报告，列出他们数据中需要修复的问题。

我们最近意识到，有一些错误（但不是全部）可以自动修复，所以我一直在想办法生成一个包含多个修复的文件副本。因此，我们把“验证”表单页面重新命名为“构建提交页面”。每次用户上传一组新文件时，目的是让他们仍然收到错误报告，同时也能自动下载一个包含多个修复的文件。

我今天才了解到，无法同时渲染一个模板并启动下载，这很合理。不过，我原本打算不把生成的修复文件保存到磁盘上。

有没有办法在展示错误的模板时，自动触发下载，而不需要事先将文件保存到磁盘上呢？

这是我目前的form_valid方法（没有触发下载，但在我意识到下载和渲染模板不能同时进行之前，我已经开始创建文件了）：

    def form_valid(self, form):
        """
        Upon valid file submission, adds validation messages to the context of
        the validation page.
        """

        # This buffers errors associated with the study data
        self.validate_study()

        # This generates a dict representation of the study data with fixes and
        # removes the errors it fixed
        self.perform_fixes()

        # This sets self.results (i.e. the error report)
        self.format_validation_results_for_template()

        # HERE IS WHERE I REALIZED MY PROBLEM.  I WANTED TO CREATE A STREAM HERE
        # TO START A DOWNLOAD, BUT REALIZED I CANNOT BOTH PRESENT THE ERROR REPORT
        # AND START THE DOWNLOAD FOR THE USER

        return self.render_to_response(
            self.get_context_data(
                results=self.results,
                form=form,
                submission_url=self.submission_url,
            )
        )

在我遇到这个问题之前，我正在编写一些伪代码来流式传输文件……这完全没有经过测试：

import pandas as pd
from django.http import HttpResponse
from io import BytesIO

def download_fixes(self):
    excel_file = BytesIO()
    xlwriter = pd.ExcelWriter(excel_file, engine='xlsxwriter')

    df_output = {}
    for sheet in self.fixed_study_data.keys():
        df_output[sheet] = pd.DataFrame.from_dict(self.fixed_study_data[sheet])
        df_output[sheet].to_excel(xlwriter, sheet)

    xlwriter.save()
    xlwriter.close()

    # important step, rewind the buffer or when it is read() you'll get nothing
    # but an error message when you try to open your zero length file in Excel
    excel_file.seek(0)

    # set the mime type so that the browser knows what to do with the file
    response = HttpResponse(excel_file.read(), content_type='application/vnd.openxmlformats-officedocument.spreadsheetml.sheet')

    # set the file name in the Content-Disposition header
    response['Content-Disposition'] = 'attachment; filename=myfile.xlsx'

    return response

所以我在想，我需要：

把文件保存到磁盘，然后想办法让结果页面开始下载
以某种方式将数据嵌入到结果模板中，并通过JavaScript将其发送回来，转化为文件下载流
以某种方式将文件保存在内存中，并从结果模板触发下载？

实现这个的最佳方法是什么呢？

更新想法：

我最近用tsv文件做了一个简单的技巧，把文件内容嵌入到结果模板中，并用一个下载按钮通过JavaScript抓取数据周围标签的innerHTML来启动“下载”。

我想，如果我对数据进行编码，可能可以用类似的方法处理Excel文件的内容。我可以进行base64编码。

我查看了过去的研究提交，最大的一个是115kb。这个大小可能会增长很多，但现在115kb是上限。

我在网上查找了如何将数据嵌入模板的方法，找到了这个：

import base64
with open(image_path, "rb") as image_file:
    image_data = base64.b64encode(image_file.read()).decode('utf-8')
ctx["image"] = image_data
return render(request, 'index.html', ctx)

我最近在JavaScript中玩base64编码，虽然是为了其他工作，这让我相信嵌入是可行的。我甚至可以自动触发。有人对这样做有什么注意事项吗？

更新

我花了一整天尝试实现@Chukwujiobi_Canon的建议，但在处理很多错误和我不熟悉的东西后，我卡住了。新标签页打开了（但它是空的），文件被下载了，但无法打开（浏览器控制台显示“框架加载中断”的错误）。

我先实现了Django代码，我认为它运行正常。当我在没有JavaScript的情况下提交表单时，浏览器下载了多部分流，看起来符合预期：

--3d6b6a416f9b5
Content-Type: application/octet-stream
Content-Range: bytes 0-9560/9561

PK?N˝Ö€]'[Content_Types].xm...

...

--3d6b6a416f9b5
Content-Type: text/html
Content-Range: bytes 0-16493/16494


<!--use Bootstrap CSS and JS 5.0.2-->
...

</html>

--3d6b6a416f9b5--

这是JavaScript代码：

validation_form = document.getElementById("submission-validation");

// Take over form submission
validation_form.addEventListener("submit", (event) => {
    event.preventDefault();
    submit_validation_form();
});
async function submit_validation_form() {
    // Put all of the form data into a variable (formdata)
    const formdata = new FormData(validation_form);
    try {
        // Submit the form and get a response (which can only be done inside an async functio
        let response;
        response = await fetch("{% url 'validate' %}", {
            method: "post",
            body: formdata,
        })
        let result;
        result = await response.text();
        const parsed = parseMultipartBody(result, "{{ boundary }}");
        parsed.forEach(part => {
            if (part["headers"]["content-type"] === "text/html") {
                const url = URL.createObjectURL(
                    new Blob(
                        [part["body"]],
                        {type: "text/html"}
                    )
                );
                window.open(url, "_blank");
            }
            else if (part["headers"]["content-type"] === "application/octet-stream") {
                console.log(part)
                const url = URL.createObjectURL(
                    new Blob(
                        [part["body"]],
                        {type: "application/octet-stream"}
                    )
                );
                window.location = url;
            }
        });
    } catch (e) {
        console.error(e);
    }
}
function parseMultipartBody (body, boundary) {
    return body.split(`--${boundary}`).reduce((parts, part) => {
        if (part && part !== '--') {
            const [ head, body ] = part.trim().split(/\r\n\r\n/g)
            parts.push({
                body: body,
                headers: head.split(/\r\n/g).reduce((headers, header) => {
                    const [ key, value ] = header.split(/:\s+/)
                    headers[key.toLowerCase()] = value
                    return headers
                }, {})
            })
        }
        return parts
    }, [])
}

服务器控制台输出看起来正常，但到目前为止，输出都无法正常工作。

django javascript 文件下载模板渲染数据验证 base64编码错误报告流式传输

如何在不占用磁盘的情况下开始下载并渲染响应？

更新

2 个回答

撰写回答