我正在使用zip函数将所有列表压缩为一个。我使用熊猫将数据存储到CSV文件中,但我得到的是一个空列表和CSV文件。 我没有看到代码中有任何错误,也许我遗漏了什么。 谢谢你的帮助。 代码如下:
import pandas as pd
from selenium import webdriver
option = Options()
driver = webdriver.Chrome(chrome_options=option, executable_path='your path\\chromedriver.exe')
driver.implicitly_wait(3)
url = "https://global.remax.com/officeagentsearch.aspx#!mode=list&type=2®ionId=1000®ionRowId=&provinceId=&cityId=&localzoneId=&name=&location=&spokenLanguageCode=&page=1&countryCode=US&countryEnuName=USA&countryName=USA&selmode=residential&officeId=&TargetLng=&TargetLat="
driver.get(url)
na = "N/A"
agent_name = []
remax_level = []
agent_phone_1 = []
agent_phone_2 = []
mobile = []
street_address = []
address_locality = []
address_region = []
address_country = []
email = []
website = []
for i in range(1, 6):
agent_details = driver.find_element_by_xpath(f'''//*[@id="list-container"]/div[1]/div/div[{i}]/div/div[1]/a''')
agent_details.click()
try:
# scraping agent's name
name = driver.find_element_by_xpath('''//*[@id="MainContent"]/div[1]/div[2]/div/div[1]/div[1]/div[1]/div[1]/h2/a''')
agent_name.append(name.text)
except:
agent_name.append(na)
try:
# scraping remax level
level = driver.find_element_by_xpath('''//*[@id="MainContent"]/div[1]/div[2]/div/div[1]/div[1]/div[1]/div[1]/div[2]/h3/span/a/span''')
remax_level.append(level.text)
except:
remax_level.append(na)
try:
# clicking on phone no 1
phone_1 = driver.find_element_by_id("AgentDirectDialSpan")
phone_1.click()
except:
pass
try:
# scraping phone no 1
phone_1_copy = driver.find_element_by_class_name("phone-link")
agent_phone_1.append(phone_1_copy.text)
except:
agent_phone_1.append(na)
try:
# clicking on phone no 2
phone_2 = driver.find_element_by_id("ctl05_ShowOffice")
phone_2.click()
except:
pass
try:
# scraping phone no 2
phone_2_copy = driver.find_element_by_class_name("OfficePhoneSpan")
agent_phone_2.append(phone_2_copy.text)
except:
agent_phone_2.append(na)
try:
# clicking on mobile num
mobile_num = driver.find_element_by_id("ctl05_ShowPhone")
mobile_num.click()
except:
pass
try:
# scraping mobile num
mobile_n = driver.find_element_by_id("PhoneSpan")
mobile.append(mobile_n.text)
except:
mobile.append(na)
try:
# scraping street address
street_add = driver.find_element_by_xpath('''//*[@id="ctl05_Address"]/span[1]''')
street_address.append(street_add.text)
except:
street_address.append(na)
try:
# scraping address locality
add_locality = driver.find_element_by_xpath('''//*[@id="ctl05_Address"]/span[2]''')
address_locality.append(add_locality.text)
except:
address_locality.append(na)
try:
# scraping address region
add_region = driver.find_element_by_xpath('''//*[@id="ctl05_Address"]/span[3]''')
address_region.append(add_region.text)
except:
address_region.append(na)
try:
# scraping address country
add_country = driver.find_element_by_xpath('''//*[@id="ctl05_Address"]/span[4]''')
address_country.append(add_country.text)
except:
address_country.append(na)
try:
# scraping emails and websites
emails_or_web = driver.find_element_by_xpath('''//span[contains(@class, 'value') and contains(@class, 'url-link') and position() = 1]''')
if emails_or_web.text[6] or emails_or_web.text[7] == "http://" or "https://":
website.append(emails_or_web.text)
else:
email.append(emails_or_web.text)
except:
website.append(na) and email.append(na)
driver.back()
continue
# zipping all the lists to one variable
all_info = list(zip(agent_name, remax_level, agent_phone_1, agent_phone_2, mobile, street_address, address_locality, address_country, email, website))
print(all_info)
df = pd.DataFrame(all_info, columns=["Agent Name", "Remax Level", "Agent Phone 1", "Agent Phone 2", "Agent Mobile", "Street Address", "Address Locality", "Address Country", "Email", "Website"])
df.to_csv("data.csv", index=False, encoding = 'utf-8')
driver.close()
嗯,我看到你只是在呼叫主
url
,就这样?如果您甚至还没有收集到主url
中的urls
,然后调用每个url
来解析它,您将如何进行解析呢即使你正在使用
selenium
来完成这样的任务,但这完全会减慢你的工作速度。因此,您必须阅读selenium documentation以了解selenium
的使用方式您尚未包含所需输出的任何
sample
。还有一些我无法理解的事情,比如level
。无论如何,因为你没有帮助提供明确的信息下面的代码应该可以实现您的目标:
输出:
我不太确定您的问题是什么,因为我没有手动测试您的代码,但是假设您的元素具有适当的XPath和id,我猜您正在尝试从列表对象(web元素列表)获取.text属性。因此,您需要向每个元素添加
.text
属性。例如,如果xpath在name = driver.find_element_by_xpath('''//*[@id="MainContent"]/div[1]/div[2]/div/div[1]/div[1]/div[1]/div[1]/h2/a''') agent_name.append(name.text)
在页面上查找“Joe Smith、Bob Jones等”的所有name元素。如果要添加循环,则可以将
.text
属性添加到每个元素。例如:names = driver.find_element_by_xpath('''//*[@id="MainContent"]/div[1]/div[2]/div/div[1]/div[1]/div[1]/div[1]/h2/a''') for name in names: agent_name.append(name.text)
这至少应该填充您的列表。如果这不起作用,我将再次检查您试图获取的内容是否确实是html中的文本属性(即不是图像),并确保您的元素标识符是正确的,并且您遵循python selenium文档中的建议/语法
相关问题 更多 >
编程相关推荐