用Python解析电子邮件

13 投票
3 回答
21891 浏览
提问于 2025-04-16 00:01

我正在写一个Python脚本,用来处理从Procmail返回的电子邮件。正如在这个问题中提到的,我使用了以下的Procmail配置:

:0:
|$HOME/process_mail.py

我的process_mail.py脚本通过标准输入接收电子邮件,格式如下:

From hostname Tue Jun 15 21:43:30 2010
Received: (qmail 8580 invoked from network); 15 Jun 2010 21:43:22 -0400
Received: from mail-fx0-f44.google.com (209.85.161.44)
by ip-73-187-35-131.ip.secureserver.net with SMTP; 15 Jun 2010 21:43:22 -0400
Received: by fxm19 with SMTP id 19so170709fxm.3
for <username@domain.com>; Tue, 15 Jun 2010 18:47:33 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.103.84.1 with SMTP id m1mr2774225mul.26.1276652853684; Tue, 15
Jun 2010 18:47:33 -0700 (PDT)
Received: by 10.123.143.4 with HTTP; Tue, 15 Jun 2010 18:47:33 -0700 (PDT)
Date: Tue, 15 Jun 2010 20:47:33 -0500
Message-ID: <AANLkTikFsIjJ3KYW1HJWcAqQlGXNiXE2YMzrj39I0tdB@mail.gmail.com>
Subject: TEST 12
From: Full Name <username@sender.com>
To: username@domain.com
Content-Type: text/plain; charset=ISO-8859-1

ONE
TWO
THREE

我想用以下方式解析这封邮件:

>>> import email
>>> msg = email.message_from_string(full_message)

我想获取像'发件人'、'收件人'和'主题'这样的邮件字段。但是,邮件对象中并没有这些字段。

我哪里做错了?

3 个回答

2

我自己回答自己。

我发现了一个问题,出现在生成消息的代码里。它在某些行之间加了换行符,这样就导致解析器无法正常工作。

5

看起来你在额外的行前面加了换行符,但没有空格。根据RFC 2822 §2.3.2的规定,这样做是不合法的:

每个头部字段实际上是一行字符,包括字段名称、冒号和字段内容。为了方便起见,以及为了处理每行最多998个字符和78个字符的限制,头部字段的内容可以分成多行,这个过程叫做“折叠”。一般来说,只要这个标准允许在空白处进行折叠,就可以在任何空白字符前插入一个换行符和回车符。例如,头部字段:

    Subject: This is a test

可以表示为:

    Subject: This
     is a test

它应该看起来像这样:

From hostname Tue Jun 15 21:43:30 2010
Received: (qmail 8580 invoked from network); 15 Jun 2010 21:43:22 -0400
Received: from mail-fx0-f44.google.com (209.85.161.44)
    by ip-73-187-35-131.ip.secureserver.net with SMTP; 15 Jun 2010 21:43:22 -0400
Received: by fxm19 with SMTP id 19so170709fxm.3
    for <username@domain.com>; Tue, 15 Jun 2010 18:47:33 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.103.84.1 with SMTP id m1mr2774225mul.26.1276652853684; Tue, 15
    Jun 2010 18:47:33 -0700 (PDT)
Received: by 10.123.143.4 with HTTP; Tue, 15 Jun 2010 18:47:33 -0700 (PDT)
Date: Tue, 15 Jun 2010 20:47:33 -0500
Message-ID: <AANLkTikFsIjJ3KYW1HJWcAqQlGXNiXE2YMzrj39I0tdB@mail.gmail.com>
Subject: TEST 12
From: Full Name <username@sender.com>
To: username@domain.com
Content-Type: text/plain; charset=ISO-8859-1

ONE
TWO
THREE
10

你需要确保这些行不会被意外地断开(就像上面那样,虽然很难说那是不是复制粘贴的问题)——要保持信息完整,比如:

Received: (qmail 8580 invoked from network); 15 Jun 2010 21:43:22 -0400
Received: from mail-fx0-f44.google.com (209.85.161.44) by ip-73-187-35-131.ip.secureserver.net with SMTP; 15 Jun 2010 21:43:22 -0400
Received: by fxm19 with SMTP id 19so170709fxm.3 for <username@domain.com>; Tue, 15 Jun 2010 18:47:33 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.103.84.1 with SMTP id m1mr2774225mul.26.1276652853684; Tue, 15 Jun 2010 18:47:33 -0700 (PDT)
Received: by 10.123.143.4 with HTTP; Tue, 15 Jun 2010 18:47:33 -0700 (PDT)
Date: Tue, 15 Jun 2010 20:47:33 -0500
Message-ID: <AANLkTikFsIjJ3KYW1HJWcAqQlGXNiXE2YMzrj39I0tdB@mail.gmail.com>
Subject: TEST 12
From: Full Name <username@sender.com>
To: username@domain.com
Content-Type: text/plain; charset=ISO-8859-1

ONE
TWO
THREE

然后

msg = email.message_from_string(msgtxt)
print msg['Subject']

就会按照预期打印出 TEST 12

撰写回答