将MHT文件中不同内容类型提取到多个MHT文件中

0 投票

1 回答

3285 浏览

提问于 2025-04-18 18:56

我正在写一个mht脚本，目的是解析一个mht文件，提取出父文件中的部分信息，然后把这些信息写入一个新的mht文件。

我写了下面这个函数，它可以打开一个指定位置的mht文件，搜索特定的内容ID，并把找到的内容写入一个新的mht文件。

def extract_content(self, file_location, content_id,extension):
    first_part = file_location.split(extension)[0]
    #checking if file exists
    new_file = first_part + "-" + content_id.split('.')[0] + extension

    while os.path.exists(new_file):
        os.remove(new_file)

    with open(file_location, 'rb') as mime_file, open(new_file, 'w') as output:
        ***#Extracting the message from the mht file***
        message = message_from_file(mime_file)
        t = mimetypes.guess_type(file_location)[0]

        #Walking through the message
        for i, part in enumerate(message.walk()):

            #Check the content_id if the one we are looking for
            if part['Content-ID'] == '<' + content_id + '>':
                ***witing the contents***
                output.write(part.as_string(unixfrom=False))

不过，我发现当内容类型是application/pdf和application/octet-stream时，我无法在IE浏览器中打开输出的部分。

我该如何将这些内容类型，比如application/pdf和application/octet-stream，写入mht文件，以便我可以在IE中查看图片或PDF文件呢？

谢谢

文件格式转换文件解析内容提取 ie浏览器信息写入 application/octet-stream mht文件 application/pdf

1 个回答

试试这个：

...
if m['Content-type'].startswith('text/'):
                    m["Content-Transfer-Encoding"] = "quoted-printable"

                else:
                    m["Content-Transfer-Encoding"] = "base64"

                m.set_payload(part.get_payload())                        
                ****Writing to output****
                info = part.as_string(unixfrom=False)
                info = info.replace('application/octet-stream', 'text/plain')
                output.write(info)
...

告诉我它是否有效。

回答于 2025-04-18 由 Python大师

分享举报

将MHT文件中不同内容类型提取到多个MHT文件中

1 个回答

撰写回答