虽然网页没有显示任何电子邮件地址,但运行我的scraper我可以在控制台上获取,但它会显示集群文档。有没有办法只保留电子邮件和电话号码的一套文件?我要做的是:
import requests
from lxml import html
def Mainpage():
url = "https://www.houzz.de/professionals/c/Deutschland"
response = requests.get(url)
tree = html.fromstring(response.text)
titles = tree.xpath('//div[@class="name-info"]')
for title in titles:
Name=title.xpath('.//a/@href')[0]
FindindEmail(Name)
def FindindEmail(pagelink):
response = requests.get(pagelink)
tree = html.fromstring(response.text)
titles = tree.xpath('//div[@class="professional-info-content"]/text()')
for title in titles:
print(title.strip())
Mainpage()
最终找到解决方案:
相关问题 更多 >
编程相关推荐