使用Python提取Outlook电子邮件数据时出错

2024-04-20 03:23:40 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个Python脚本,它使用os.walkwin32com.client从我的C:/drive上的文件夹及其子文件夹中提取Outlook电子邮件文件(.msg)中的信息。它似乎可以工作,但是当我尝试对返回的数据帧执行任何操作时(例如emailData.head()Python崩溃)。由于权限错误,我也无法将数据帧写入.csv。在

我想知道我的代码是否没有正确关闭outlook/每封邮件,这就是导致问题的原因?任何帮助都将不胜感激。在

import os
import win32com.client
import pandas as pd

# initialize Outlook client
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")

# set input directory (where the emails are) and output directory (where you
# would like the email data saved)
inputDir = 'C:/Users/.../myFolderPath'
outputDir = 'C:/Users/.../myOutputPath'


def emailDataCollection(inputDir,outputDir):
    """ This function loops through an input directory to find
    all '.msg' email files in all folders and subfolders in the
    directory, extracting information from the email into lists,
    then converting the lists to a Pandas dataframe before exporting
    to a '.csv' file in the output directory
    """
    # Initialize lists
    msg_Path = []
    msg_SenderName = []
    msg_SenderEmailAddress = []
    msg_SentOn = []
    msg_To = []
    msg_CC = []
    msg_BCC = []
    msg_Subject = []
    msg_Body = []
    msg_AttachmentCount = []

    # Loop through the directory
    for root, dirnames, filenames in os.walk(inputDir):
        for filename in filenames:
            if filename.endswith('.msg'): # check to see if the file is an email
                filepath = os.path.join(root,filename) # save the full filepath
                # Extract email data into lists
                msg = outlook.OpenSharedItem(filepath)
                msg_Path.append(filepath)
                msg_SenderName.append(msg.SenderName)
                msg_SenderEmailAddress.append(msg.SenderEmailAddress)
                msg_SentOn.append(msg.SentOn)
                msg_To.append(msg.To)
                msg_CC.append(msg.CC)
                msg_BCC.append(msg.BCC)
                msg_Subject.append(msg.Subject)
                msg_Body.append(msg.Body)
                msg_AttachmentCount.append(msg.Attachments.Count)
                del msg

    # Convert lists to Pandas dataframe
    emailData = pd.DataFrame({'Path' : msg_Path,
                          'SenderName' : msg_SenderName,
                          'SenderEmailAddress' : msg_SenderEmailAddress,
                          'SentOn' : msg_SentOn,
                          'To' : msg_To,
                          'CC' : msg_CC,
                          'BCC' : msg_BCC,
                          'Subject' : msg_Subject,
                          'Body' : msg_Body,
                          'AttachmentCount' : msg_AttachmentCount
    }, columns=['Path','SenderName','SenderEmailAddress','SentOn','To','CC',
            'BCC','Subject','Body','AttachmentCount'])


    return(emailData)


# Call the function
emailData = emailDataCollection(inputDir,outputDir)

# Causes Python to crash
emailData.head()
# Fails due to permission error
emailData.to_csv(outputDir,header=True,index=False)

Tags: thetoemailbodymsgdirectoryccsubject
2条回答

我得到属性错误:OpenSharedItem.SenderName,当我处理大量电子邮件时。代码可以很好地处理有限的电子邮件(尝试了5到10封电子邮件)

希望这还不算太晚,但我找到了问题的根源:

由于msg_SentOn的datetime数据,内核崩溃。如果您检查msg_SentOn中数据的type(),它将被分类为pywintype.datetime与熊猫不相容。在

您需要将msg_SentOn中的元素转换为日期时间。日期时间格式。在

这里的源代码很有用:http://timgolden.me.uk/python/win32_how_do_i/use-a-pytime-value.html

相关问题 更多 >