从python中的电子邮件中提取本身属于MSG类型的附件

2024-04-18 16:15:16 发布

您现在位置:Python中文网/ 问答频道 /正文

我需要从电子邮件中提取一个msg类型的附件,并将msg附件保存到python中的某个位置

我编写的脚本适用于除outlook项以外的几乎所有类型的文件

def parse_attachment(message_part):
    content_disposition = message_part.get("Content-Disposition", None)
    if content_disposition:
        dispositions = content_disposition.strip().split(";")
        if bool(content_disposition and (dispositions[0].lower() == "attachment" or dispositions[0].lower() == "inline")):

            file_data = message_part.get_payload(decode=True)
            debug(message_part)
            attachment = {}
            attachment['data'] = file_data
            attachment['content_type'] = message_part.get_content_type()
            attachment['size'] = len(file_data)

            for param in dispositions[1:]:
                name,value = param.split("=")
                name = name.lower().strip()
                value = value.strip().strip("\"")

                if name == "filename":
                    attachment['name'] = value
                elif name == "creation-date":
                    attachment['creation-date'] = value
                elif name == "modification-date":
                    attachment['modification-date'] = value
                elif name == "size":
                    attachment['size'] = value
            return attachment

    return None

Tags: namemessagedataattachmentgetdateifvalue
1条回答
网友
1楼 · 发布于 2024-04-18 16:15:16

我们必须单独处理电子邮件附件。但是,如果我们使用walk(),这是一个通用的生成器,可以用来迭代消息对象树的所有部分和子部分,按照深度优先遍历顺序,我们也会解析附件电子邮件

因此,我们必须使用get_payload()来获取电子邮件的每一部分。下面是我们如何解析电子邮件附件-

def get_subject(msgobj) :
    subject = None
    if msgobj['Subject'] is not None:
        decodefrag = decode_header(msgobj['Subject'])
        subj_fragments = []
        for s , enc in decodefrag:
            if enc:
                s = unicode(s , enc).encode('utf8','replace')
            subj_fragments.append(s)
        subject = ''.join(subj_fragments)
        subject = re.sub('\n', '', subject)
    return subject

def get_msg_file_as_attachment(message_part):
    attachment = {}
    attachment['data'] = message_part.get_payload()[0].as_string(unixfrom=True)
    attachment['content_type'] = message_part.get_content_type()
    attachment['name'] = get_subject(message_part.get_payload()[0])
    attachment['name'] +=  '.eml'
    attachment['size'] = len(attachment['data'])

    return attachment

def parse_attachment(message_part):
    content_disposition = message_part.get("Content-Disposition", None)
    content_type = message_part.get_content_type()
    if content_disposition:
        dispositions = content_disposition.strip().split(";")
        if bool(content_disposition and (dispositions[0].lower() == "attachment" or dispositions[0].lower() == "inline")):
            if (content_type.lower().strip() == 'message/rfc822'):
                return get_msg_file_as_attachment(message_part)

            else:
                file_data = message_part.get_payload(decode=True)
                attachment = {}
                attachment['data'] = file_data
                attachment['content_type'] = content_type
                attachment['size'] = len(file_data)
                attachment['name'] = message_part.get_filename()
                return attachment

    return None

相关问题 更多 >