在Python中提取电子邮件中的URL

1 投票

6 回答

10410 浏览

提问于 2025-04-15 16:21

感谢你提交的信息到我们的目录网站 ourdirectory.com 链接: http://myurlok.us 请点击下面的链接来确认你的提交。 http://www.ourdirectory.com/confirm.aspx?id=1247778154270076

Once we receive your comfirmation, your site will be included for process!
regards,

http://www.ourdirectory.com

Thank you!

应该很明显我需要提取哪个链接。

URL提取数据解析超链接电子邮件处理网页链接确认链接信息提交流程目录网站

6 个回答

@OP，如果你的邮箱格式总是一样的，

f=open("emailfile")
for line in f:
    if "confirm your submission" in line:
        print f.next().strip()        
f.close()

回答于 2025-04-15 由 Python大师

分享举报

如果你要处理带有超链接的HTML邮件，可以使用HTMLParse这个库，这样会更简单快捷。

import HTMLParser
class parseLinks(HTMLParser.HTMLParser):
    def handle_starttag(self, tag, attrs):
        if tag == 'a':
            for name, value in attrs:
                if name == 'href':
                    print value
                    print self.get_starttag_text()

someHtmlContainingLinks = ""
linkParser = parseLinks()
linkParser.feed(someHtmlContainingLinks)

回答于 2025-04-15 由 Python大师

分享举报

这个解决方案只有在源内容不是HTML的时候才有效。

def extractURL(self,fileName):

    wordsInLine = []
    tempWord = []
    urlList = []

    #open up the file containing the email
    file = open(fileName)
    for line in file:
        #create a list that contains each word in each line
        wordsInLine = line.split(' ')
        #For each word try to split it with :
        for word in wordsInLine:
            tempWord = word.split(":")
            #Check to see if the word is a URL
            if len(tempWord) == 2:
                if tempWord[0] == "http" or tempWord[0] == "https":
                    urlList.append(word)

    file.close()

    return urlList

回答于 2025-04-15 由 Python大师

分享举报

在Python中提取电子邮件中的URL

6 个回答

撰写回答