使用Python和imaplib处理编码消息

3 投票
4 回答
4173 浏览
提问于 2025-04-18 15:08

在过去的几天里,我一直在写一个脚本,用来解析自动生成的客服工单,并把内容存储到数据库里。在测试的时候,我遇到了一些看起来被编码过的邮件,这导致脚本无法正常工作。下面是一个RFC822格式的邮件示例:

"[(b'9255 (RFC822 {12558}', b'Delivered-To: XXXXXXXXX\r\nReceived: by 10.220.77.132 with SMTP id g4csp176213vck;\r\n Mon, 28 Jul 2014 09:37:05 -0700 (PDT)\r\nX-Received: by 10.67.30.130 with SMTP id ke2mr39896936pad.44.1406565425185;\r\n Mon, 28 Jul 2014 09:37:05 -0700 (PDT)\r\nReturn-Path: \r\nReceived: from XXXXXXXXX (XXXXXXXXX [74.125.149.112])\r\n by XXXXXXXXX with SMTP id yh3si18379315pab.170.2014.07.28.09.37.04\r\n for ;\r\n Mon, 28 Jul 2014 09:37:04 -0700 (PDT)\r\nReceived-SPF: none (XXXXXXXXX: XXXXXXXXX does not designate permitted sender hosts) client-ip=74.125.149.141;\r\nAuthentication-Results: XXXXXXXXX;\r\n spf=neutral (XXXXXXXXX: XXXXXXXXX does not designate permitted sender hosts) v\r\nReceived: from XXXXXXXXX ([74.125.149.141]) by XXXXXXXXX ([74.125.148.10]) with SMTP;\r\n\tMon, 28 Jul 2014 16:37:04 GMT\r\nReceived: from XXXXXXXXX ([209.85.213.178]) (using TLSv1) by XXXXXXXXX ([74.125.148.12]) with SXXXXXXXXX; Mon, 28 Jul 2014 09:37:04 PDT\r\nReceived: by XXXXXXXXX with SMTP id uq10sf3897971igb.11\r\n for ; Mon, 28 Jul 2014 09:37:03 -0700 (PDT)\r\nX-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;\r\n d=1e100.net; s=20130820;\r\n h=x-gm-message-state:mime-version:from:to:date:subject:message-id\r\n :x-original-sender:x-original-authentication-results:precedence\r\n :mailing-list:list-id:list-post:list-help:list-archive\r\n :list-unsubscribe:content-type:content-transfer-encoding;\r\n bh=H+FlcmWQAFURCHnDFK/bNHUOvofUAPB8bcDYlBceyxE=;\r\n b=LoR8D1MK8eoDG9DLkP9gkfR82+EGUIEeOTLpqymqxyx9HJl0C9BW6iwPD7OgrJFbV4\r\n xWYumML6RCinpcZc4d6VCDSw+akXLdhiol+lbWJBZWvgN4BQPgHJwCF6EaHYf3h8j4tq\r\n /KAZIkXowz4/WKW8STri4BVjlA2a4LPwV/wazP+I9Kvr1yz433ymd+iCY1V0NexTI+cb\r\n 9m3IyL8sqB0+Efyu+XQrR2y7ZdXDPwdzGS/WNHJBtKga5xPDtPga+21pozVMCbuCc/cj\r\n Cx9me6cVo19PrNKIOtSimDZ1u6ELdpVr4iprYQSaT8arYYiCPHJE34OFPlqspTxjm1eI\r\n ngyg==\r\nX-Gm-Message-State: ALoCoQkb908wRLWedDE+CtRzjD6VwC6Nja6duttyoVAdf+TFFn+uCxFB0Kwd5jk411YWdMD2G6HuFeRj2y3q7EzTe/vTvPLfymDIkHwZQa1r1zQ8I1B254t6v01ourR8InF/41aPGnnD\r\nX-Received: by 10.42.48.74 with SMTP id r10mr26049776icf.18.1406565423564;\r\n Mon, 28 Jul 2014 09:37:03 -0700 (PDT)\r\nX-Received: by 10.42.48.74 with SMTP id r10mr26049775icf.18.1406565423537;\r\n Mon, 28 Jul 2014 09:37:03 -0700 (PDT)\r\nX-BeenThere: XXXXXXXXX\r\nReceived: by 10.50.153.15 with SMTP id vc15ls1961411igb.42.gmail; Mon, 28 Jul\r\n 2014 09:37:03 -0700 (PDT)\r\nX-Received: by 10.66.254.37 with SMTP id af5mr39703901pad.113.1406565423331;\r\n Mon, 28 Jul 2014 09:37:03 -0700 (PDT)\r\nReceived: from XXXXXXXXX (XXXXXXXXX [74.125.149.158])\r\n by XXXXXXXXX with SMTP id da9si9190520pdb.425.2014.07.28.09.37.02\r\n for ;\r\n Mon, 28 Jul 2014 09:37:03 -0700 (PDT)\r\nReceived-SPF: none (XXXXXXXXX: XXXXXXXXX does not designate permitted sender hosts) client-ip=207.211.31.47;\r\nReceived: from XXXXXXXXX ([207.211.31.47]) by XXXXXXXXX ([74.125.148.10]) with SMTP;\r\n\tMon, 28 Jul 2014 16:37:02 GMT\r\nReceived: from XXXXXXXXX (XXXXXXXXX\r\n [129.135.112.43]) (Using TLS) by XXXXXXXXX; Mon, 28 Jul\r\n 2014 12:37:01 -0400\r\nReceived: from XXXXXXXXX (129.135.128.210) by XXXXXXXXX\r\n (129.135.112.45) with Microsoft SMTP Server id 14.3.181.6; Mon, 28 Jul 2014\r\n 11:36:58 -0500\r\nReceived: from ITSDC50 ([127.0.0.1]) by XXXXXXXXX with Microsoft\r\n SMTPSVC(6.0.3790.4675);\t Mon, 28 Jul 2014 11:36:58 -0500\r\nMIME-Version: 1.0\r\nFrom: \r\nTo: \r\nDate: Mon, 28 Jul 2014 11:36:58 -0500\r\nSubject: Dispatching IT/Cares Case: SC-118656-7031\r\nMessage-ID: \r\nX-OriginalArrivalTime: 28 Jul 2014 16:36:58.0498 (UTC) FILETIME=[26792E20:01CFAA82]\r\nX-MC-Unique: 114072812370105901\r\nX-pstn-levels: (S:85.19264/99.90000 CV:99.9000 FC:95.5390 LC:95.5390 R:95.9108 P:95.9108 M:97.0282 C:98.6951 )\r\nX-pstn-dkim: 0 skipped:not-enabled\r\nX-pstn-settings: 1 (0.1500:0.1500) cv gt6 gt5 gt4 gt3 gt2 gt1\r\nX-pstn-addresses: from [1094/49]\r\nX-pstn-nxpr: disp=neutral, envrcpt=XXXXXXXXX\r\nX-pstn-nxp: bodyHash=9500f76054cf97c2a0eec20f8940768958faf6c3, headerHash=eb9362a172738328a8b8a8ae406c42a63f5545f9, keyName=4, rcptHash=e0dd4695780dcb1818e78b482447ac976870bcbe, sourceip=207.211.31.47, version=1\r\nX-Original-Sender: XXXXXXXXX\r\nX-Original-Authentication-Results: XXXXXXXXX; spf=neutral\r\n (XXXXXXXXX: XXXXXXXXX does not designate permitted sender\r\n hosts) smtp.mail=XXXXXXXXX\r\nPrecedence: list\r\nMailing-list: list XXXXXXXXX contact XXXXXXXXX\r\nList-ID: \r\nX-Google-Group-Id: 511158325204\r\nList-Post: , \r\nList-Help: , \r\nList-Archive: \r\nList-Unsubscribe: ,\r\n \r\nContent-Type: text/plain; charset=UTF-8\r\nContent-Transfer-Encoding: base64\r\nX-pstn-neptune: 0/0/0.00/0\r\nX-pstn-levels: (S:65.87536/99.90000 CV:99.9000 FC:95.5390 LC:95.5390 R:95.9108 P:95.9108 M:97.0282 C:98.6951 )\r\nX-pstn-dkim: 0 skipped:not-enabled\r\nX-pstn-settings: 5 (2.0000:0.0200) s cv fc lc gt6 gt5 gt4 GT3 gt2 gt1 ft lt r p m c \r\nX-pstn-addresses: from [db-null] \r\nX-pstn-nxpr: disp=neutral, envrcpt=XXXXXXXXX\r\nX-pstn-nxp: bodyHash=45f4f2e59005199791055b3d1f937e1d3fb7d7ca, headerHash=ca981838d5783da04d9d38e3fffc3f5907100fcf, keyName=4, rcptHash=4f3dee680a09495dc5b095849a4225f49c4a45f4, sourceip=74.125.149.141, version=1\r\n\r\nQ2FzZSBOdW1iZXI6ICAgICAgICAgU0MtMTE4NjU2LTcwMzENClNldmVyaXR5IExldmVsOiAg\r\nICAgIE5vcm1hbA0KQWNjb3VudCBOYW1lOiAgICAgICAgSENSIE1hbm9yY2FyZQ0KU2l0ZSBO\r\nYW1lOiAgICAgICAgICAgMzAxDQpDbGllbnQgTmFtZTogICAgICAgICBBbWFuZGEgUGVucm9k\r\nDQpDbGllbnQgUGhvbmU6ICAgICAgICANCkNsaWVudCBNYWlsUGF0aDogICAgIGFtYW5kYS5w\r\nZW5yb2RAaGNyLW1hbm9yY2FyZS5jb20NCkNhc2UgUHJvZHVjdDogICAgICAgIEhDUi1GaWVs\r\nZCBEZXBsb3ltZW50DQpDYXNlIEtleXdvcmQ6ICAgICAgICBGRC1BU0QNCg0KDQoNClBsZWFz\r\nZSBDbGljayBCZWxvdyB0byBVcGRhdGUgQ2FzZTogDQoNCg0KUHJvYmxlbSBEZXNjcmlwdGlv\r\nbg0KKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioq\r\nKioqKioqKioNCjw8LSBUaGlzIENhc2UgaXMgYSBTdWItQ2FzZSBvZjogRU0tMTE4NjU2LTcw\r\nMTcgIC0+Pg0KDQpQbGVhc2UgZGlzcGF0Y2ggd2lyaW5nIHRlY2ggdG8gaW5zdGFsbCB0d28g\r\nbmV3IG5ldHdvcmsgZHJvcHMuIE9uZSBpbiB0aGUgTnVyc2UgTWFuYWdlIE9mZmljZSBhbmQg\r\nb25lIGluIHRoZSBDYXNlIE1hbmFnZW1lbnQgT2ZmaWNlDQoNCkxvY2F0aW9uIG9mIGRyb3Ag\r\naXM6ICAgICAgIE51cnNlIE1hbmFnZXIgT2ZmaWNlICYgQ2FzZSBNYW5hZ2VtZW50IE9mZmlj\r\nZQ0KUGhvbmUgRXh0IChJZiBQaG9uZSBEcm9wKTogbi9hDQoNCk9ubHkgQ2F0NWUgUGxlbnVt\r\nIFJhdGVkIChDTVApIGNhYmxlIGNhbiBiZSUgdXNlZCBmb3IgbmV3IGRyb3BzLiBBZGdpbmcg\r\nUmFjZXdheS9XaXJlbW9sZCBpcyBub3QgYW4gb3B0aW9uIHdpdGhvdXQgcHJpb3IuIElmIFJh\r\nY2V3YXkvV2lyZW1vbGQgaXMgcXVpdGUgY29udGFjdGVkLCBwbGVhc2Ugbm90aWZ5IHlvdXIu\r\n

这封邮件的内容是被编码了吗?如果是的话,我该怎么处理解码呢?

4 个回答

1

要获取通过Imaplib生成的经过base64编码的电子邮件,可以使用base64模块轻松解码,步骤如下。

# import python base64 and re modules
import base64
import re

# Save the encoded part in a variable, e.g re.search grabs the encoded part after string "version=1\r\n\r\n" and saves to the variable named 'decoded'.
decoded = re.search('version=1\r\n\r\n(.*)', email, re.DOTALL)

# prints the decoded message
print(base64.b64decode(decoded.group(1)))

如果你想在两个字符串之间提取编码的部分,可以使用以下搜索表达式。

decoded = re.search('string1(.*)string2', email)

这里的'email'变量包含了Imaplib生成的整个电子邮件内容。

2

试试Imbox,在这里你不需要去修复编解码器。

因为imaplib是一个非常底层的库,它返回的结果很难处理。

安装方法

pip install imbox

使用方法

from imbox import Imbox

with Imbox('imap.gmail.com',
        username='username',
        password='password',
        ssl=True,
        ssl_context=None,
        starttls=False) as imbox:

    all_inbox_messages = imbox.messages()
    for uid, message in all_inbox_messages:
        message.sent_from
        message.sent_to
        message.body
5

你可以使用 email 这个包来处理这个问题。这里面有一个列表,列表的第一个项目是一个元组,元组的第二个元素就是整个邮件内容。假设你把这个字节对象放在一个叫 msg_bytes 的变量里。然后你可以用下面的代码来解析这个邮件:

import email.parser
msg = email.parser.BytesParser().parsebytes(msg_bytes)

接着,你可以访问邮件的不同部分(具体可以参考 email.message.Message 的文档):

# get a bytes object containing the base64-decoded message
textbytes = msg.get_payload(decode=True)

# get the content charset
content_charset = msg.get_content_charset()

# decode the text to obtain a string object
text = textbytes.decode(content_charset)

这样就能处理大部分,甚至所有有效的邮件了。

1

邮件的内容是用一种叫做 base64 的方式编码的,这和加密是不一样的。把前面那一串字符放到网上的解码器里

Q2FzZSBOdW1iZXI6ICAgICAgICAgU0MtMTE4NjU2LTcwMzENClNldmVyaXR5IExldmVsOiAg

就能解码成

Case Number:         SC-118656-7031
Severity Level:  

Python 有一些库可以用来解码 base64,不过如果 imaplib 里没有内置的功能来简化这个过程,我会感到失望。

撰写回答