PythonSelenium从webpag收到电子邮件

2024-04-28 12:17:15 发布

您现在位置:Python中文网/ 问答频道 /正文

我想从this页提取姓名、电话号码和电子邮件地址。在

代码是可行的,但问题是其中一些名字的“卡片”中有多个链接,所以当我进行提取时,它会抛出所有链接的时间顺序。。。例如:

Julio (July) Anopol, , , Mobile: 416-678-2916, mailto:julio.luis.anopol@freedom55financial.com

Henry D. Arauag, Office: 905-276-1177 , Ext. 594 , Mobile: 647-649-7955 , mailto:henry.arauag@freedom55financial.com

Rick Auckbaraullee , Office: 905-276-1177 , Ext. 557 , Mobile: 416-577-2377 , mailto:rick.auckbaraullee@freedom55financial.com

Frank Basile , Office: 905-276-1177 , Ext. 469 , Mobile: 416-797-9316 , mailto:frank.basile@freedom55financial.com

Janis Bellman , Office: 905-276-1177 , Ext. 601 , Mobile: 416-258-0630 , https://www.linkedin.com/in/janisbellman

Sean Beneteau , Office: 905-363-5800 , Ext. 123 , , https://www.facebook.com/MyBellman/

Carmen Briguglio , Office: 905-824-5660 , , , https://twitter.com/BellmanJanis

Qi Jun (Steve) Cai , Office: 905-276-1177 , Ext. 591 , Mobile: 416-949-1069 , mailto:janis.bellman@freedom55financial.com

如你所见,如果名字的“卡片”上有另一个链接,那么这个序列就会被丢弃

这是我的代码:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException

# example option: add 'incognito' command line arg to options
option = webdriver.ChromeOptions()
option.add_argument("--incognito")

# create new instance of chrome in incognito mode
browser = webdriver.Chrome(executable_path='/Library/Application    Support/Google/chromedriver', chrome_options=option)

# go to website
browser.get("https://www.freedom55financial.com/ff/advisor/Ontario/Mississauga")

browser.implicitly_wait(4)

# extract names from parent element
all_names = browser.find_elements_by_xpath('//*[@id="advisor-results"]/article[*]/section/h2')

# extract all phone numbers
all_off_phones_numbers = browser.find_elements_by_xpath('//*[@id="advisor-results"]/article[*]/section/p[4]/a')

#extract all exts
all_exts = browser.find_elements_by_xpath('//*[@id="advisor-results"]/article[*]/section/p[5]')

#extract all cell numbers
all_cell_numbers = browser.find_elements_by_xpath('//*[@id="advisor-results"]/article[*]/section/p[6]/a')

#extract all email addys
all_emails = browser.find_elements_by_xpath('//*[@id="advisor-results"]/article[*]/footer/a[*]')

# print out all info
num_page_items = len(all_names)
for i in range(num_page_items):
    print(all_names[i].text + " , " + all_off_phones_numbers[i].text + " , " + all_exts[i].text + " , " + all_cell_numbers[i].text + " , " + all_emails[i].get_attribute('href'))
    # print(all_names[i].text + " , " + all_off_phones_numbers[i].text + " , " + all_exts[i].text + " , " + all_cell_numbers[i].text + " , " + all_emails[i].text)

browser.close()

页面中显示信息如何包含的HTML代码示例:

^{pr2}$

我尝试过各种变体,通过css选择器、xpath包含文本等等,都没有用。在

我怎样才能让这个工作得到电子邮件?在

提前谢谢。在


Tags: textfrombrowsercombyallmobilexpath