检测邮件是否为“投递状态通知”并提取信息 - Python

13 投票
3 回答
22169 浏览
提问于 2025-04-16 13:39

我正在使用Python的email模块来解析电子邮件。

我需要判断一封邮件是否是“投递状态通知”,并找出状态是什么,同时提取关于失败邮件的信息,比如主题。

我用.parsestr(email)解析后得到的对象是这样的:

{'Content-Transfer-Encoding': 'quoted-printable',
 'Content-Type': 'text/plain; charset=ISO-8859-1',
 'Date': 'Mon, 14 Mar 2011 11:26:24 +0000',
 'Delivered-To': 'sender@gmail.com',
 'From': 'Mail Delivery Subsystem <mailer-daemon@googlemail.com>',
 'MIME-Version': '1.0',
 'Message-ID': '<000e08jf90sd9f00e6f943f@google.com>',
 'Received': 'by 10.142.13.8 with SMTP id 8cs63078wfm;\r\n        Mon, 14 Mar 2011 04:26:24 -0700 (PDT)',
 'Return-Path': '<>',
 'Subject': 'Delivery Status Notification (Failure)',
 'To': 'sender@gmail.com',
 'X-Failed-Recipients': 'recipient@gmail.com'}

首先,我怎么能在不使用正则表达式检查主题的情况下,判断这封邮件是否是DSN呢?

其次我怎么能访问邮件的正文,以及邮件服务器返回的错误信息呢?

补充:我发现我需要使用.get_payload()来获取消息的内容。

邮件文档中说:

解析器类在其公共接口上没有区别。它确实有一些额外的智能来识别消息/投递状态类型的消息,这些消息会被表示为一个包含每个头部块的投递状态通知的消息实例


更新:

基本上,我需要能够可靠地检测到一封邮件是否是DSN,然后提取原始消息,这样我就可以用email.Parser()来解析它,并获取相关信息。

3 个回答

2

X-Failed-Recipients这个头部信息似乎是识别gmail发送失败通知的最快方法。之后,你还需要解析一下文本内容。

4

我不太用Python,但我猜Gmail在对DSN的支持上有所改善,因为我的测试都成功了:

你可以在下面的例子中看到,这是一个多部分的消息,里面有“Content-Type: multipart/report; report-type=delivery-status”。

我可靠地识别它是DSN的方法是:

  • 第一行是“Return-path: <>”
  • Content-Type是“multipart/report”,并且“report-type=delivery-status”

然后,我知道:

  • 报告的内容在Content-Type为“message/delivery-status”的部分里。
  • 状态和操作字段在报告内容中总是存在。
  • 注意,状态字段可能没有其他状态那么精确,这些状态可能出现在Diagnostic-Code字段中(这个字段不是必须的)。不过,下面的例子很好(所有字段的状态都是一样的)。
  • 原始消息在Content-Type为“message/rfc822”的部分里。有时候,邮件传输代理(MTA)只返回原始消息的头部而没有内容。在这种情况下,Content-Type是“text/rfc822-headers”。

这是发送到test-dsn-failure@gmail.com后收到的DSN示例:

Return-path: <>
Received: from xxx ([xxx])
    by xxx with ESMTP; Fri, 04 May 2012 16:18:13 +0200
From: <Mailer-Daemon@xxx> (Mail Delivery System)
To: xxx
Subject: Undelivered Mail Returned to Sender
Date: Fri, 04 May 2012 15:25:09 +0200
MIME-Version: 1.0
Content-Type: multipart/report; report-type=delivery-status;
 boundary="HTB3nt3RR7vw/QMPR4kDPbKg+XWjXIKdC/rfHQ=="

This is a MIME-encapsulated message.

--HTB3nt3RR7vw/QMPR4kDPbKg+XWjXIKdC/rfHQ==
Content-Description: Notification
Content-Type: text/plain

I'm sorry to have to inform you that your message could not
be delivered to one or more recipients. It's attached below.

For further assistance, please send mail to <postmaster@xxx>

If you do so, please include this problem report. You can
delete your own text from the attached returned message.

<test-dsn-failure@gmail.com>: 550-5.1.1 The email account that you tried to reach does not exist. Please try
550-5.1.1 double-checking the recipient's email address for typos or
550-5.1.1 unnecessary spaces. Learn more at
550 5.1.1 http://support.google.com/mail/bin/answer.py?answer=6596 t12si10077186weq.36


--HTB3nt3RR7vw/QMPR4kDPbKg+XWjXIKdC/rfHQ==
Content-Description: Delivery report
Content-Type: message/delivery-status

Reporting-MTA: dns; xxx
Arrival-Date: Fri, 04 May 2012 15:25:09 +0200

Final-Recipient: rfc822; test-dsn-failure@gmail.com
Status: 5.1.1
Action: failed
Last-Attempt-Date: Fri, 04 May 2012 15:25:09 +0200
Diagnostic-Code: smtp; 550-5.1.1 The email account that you tried to reach does not exist. Please try
550-5.1.1 double-checking the recipient's email address for typos or
550-5.1.1 unnecessary spaces. Learn more at
550 5.1.1 http://support.google.com/mail/bin/answer.py?answer=6596 t12si10077186weq.36

--HTB3nt3RR7vw/QMPR4kDPbKg+XWjXIKdC/rfHQ==
Content-Description: Undelivered Message
Content-Type: message/rfc822

[original message...]
22

你引用的文档说,如果消息是DSN,那么它就是多部分的:

import email

msg = email.message_from_string(emailstr)

if (msg.is_multipart() and len(msg.get_payload()) > 1 and 
    msg.get_payload(1).get_content_type() == 'message/delivery-status'):
    # email is DSN
    print(msg.get_payload(0).get_payload()) # human-readable section
    
    for dsn in msg.get_payload(1).get_payload():
        print('action: %s' % dsn['action']) # e.g., "failed", "delivered"
        
    if len(msg.get_payload()) > 2:
        print(msg.get_payload(2)) # original message

送达状态通知的格式(来自rfc 3464):

A DSN is a MIME message with a top-level content-type of
multipart/report (defined in [REPORT]).  When a multipart/report
content is used to transmit a DSN:

(a) The report-type parameter of the multipart/report content is
    "delivery-status".

(b) The first component of the multipart/report contains a human-
    readable explanation of the DSN, as described in [REPORT].

(c) The second component of the multipart/report is of content-type
    message/delivery-status, described in section 2.1 of this
    document.

(d) If the original message or a portion of the message is to be
    returned to the sender, it appears as the third component of the
    multipart/report.

撰写回答