返回空列表和CSV文件

2024-06-02 07:35:05 发布

您现在位置:Python中文网/ 问答频道 /正文

enter image description here我正在自动化此链接:

https://global.remax.com/officeagentsearch.aspx#!mode=list&type=2&regionId=1000&regionRowId=&provinceId=&cityId=&localzoneId=&name=&location=&spokenLanguageCode=&page=1&countryCode=US&countryEnuName=USA&countryName=USA&selmode=residential&officeId=&TargetLng=&TargetLat=

我正在使用zip函数将所有列表压缩为一个。我使用熊猫将数据存储到CSV文件中,但我得到的是一个空列表和CSV文件。 我没有看到代码中有任何错误,也许我遗漏了什么。 谢谢你的帮助。 代码如下:

import pandas as pd
from selenium import webdriver

option = Options()

driver = webdriver.Chrome(chrome_options=option, executable_path='your path\\chromedriver.exe')

driver.implicitly_wait(3)

url = "https://global.remax.com/officeagentsearch.aspx#!mode=list&type=2&regionId=1000&regionRowId=&provinceId=&cityId=&localzoneId=&name=&location=&spokenLanguageCode=&page=1&countryCode=US&countryEnuName=USA&countryName=USA&selmode=residential&officeId=&TargetLng=&TargetLat="

driver.get(url)

na = "N/A"

agent_name = []
remax_level = []
agent_phone_1 = []
agent_phone_2 = []
mobile = []
street_address = []
address_locality = []
address_region = []
address_country = []
email = []
website = []

for i in range(1, 6):
    agent_details = driver.find_element_by_xpath(f'''//*[@id="list-container"]/div[1]/div/div[{i}]/div/div[1]/a''')
    agent_details.click()

    try:
        # scraping agent's name 
        name = driver.find_element_by_xpath('''//*[@id="MainContent"]/div[1]/div[2]/div/div[1]/div[1]/div[1]/div[1]/h2/a''')
        agent_name.append(name.text)
    except:
        agent_name.append(na)

    try:
        # scraping remax level 
        level = driver.find_element_by_xpath('''//*[@id="MainContent"]/div[1]/div[2]/div/div[1]/div[1]/div[1]/div[1]/div[2]/h3/span/a/span''')
        remax_level.append(level.text)
    except:
        remax_level.append(na)

    try:
        # clicking on phone no 1
        phone_1 = driver.find_element_by_id("AgentDirectDialSpan")
        phone_1.click()
    except:
        pass

    try:
        # scraping phone no 1
        phone_1_copy = driver.find_element_by_class_name("phone-link")
        agent_phone_1.append(phone_1_copy.text)
    except:
        agent_phone_1.append(na)

    try:
        # clicking on phone no 2
        phone_2 = driver.find_element_by_id("ctl05_ShowOffice")
        phone_2.click()
    except:
        pass

    try:
        # scraping phone no 2
        phone_2_copy = driver.find_element_by_class_name("OfficePhoneSpan")
        agent_phone_2.append(phone_2_copy.text)
    except:
        agent_phone_2.append(na)

    try:   
        # clicking on mobile num
        mobile_num = driver.find_element_by_id("ctl05_ShowPhone")
        mobile_num.click()
    except:
        pass

    try:
        # scraping  mobile num
        mobile_n = driver.find_element_by_id("PhoneSpan")
        mobile.append(mobile_n.text)
    except:
        mobile.append(na)

    try:
        # scraping street address
        street_add = driver.find_element_by_xpath('''//*[@id="ctl05_Address"]/span[1]''')
        street_address.append(street_add.text)
    except:
        street_address.append(na)

    try:
        # scraping address locality
        add_locality = driver.find_element_by_xpath('''//*[@id="ctl05_Address"]/span[2]''')
        address_locality.append(add_locality.text)
    except:
        address_locality.append(na)

    try:
        # scraping address region
        add_region = driver.find_element_by_xpath('''//*[@id="ctl05_Address"]/span[3]''')
        address_region.append(add_region.text)
    except:
        address_region.append(na)

    try:
        # scraping address country
        add_country = driver.find_element_by_xpath('''//*[@id="ctl05_Address"]/span[4]''')
        address_country.append(add_country.text)
    except:
        address_country.append(na)

    try: 
        # scraping emails and websites
        emails_or_web = driver.find_element_by_xpath('''//span[contains(@class, 'value') and contains(@class, 'url-link') and position() = 1]''')

        if emails_or_web.text[6] or emails_or_web.text[7] == "http://" or "https://":
            website.append(emails_or_web.text)

        else:
            email.append(emails_or_web.text)
    except:
        website.append(na) and email.append(na)

    driver.back()
    continue

# zipping all the lists to one variable
all_info = list(zip(agent_name, remax_level, agent_phone_1, agent_phone_2, mobile, street_address, address_locality, address_country, email, website))
print(all_info)

df = pd.DataFrame(all_info, columns=["Agent Name", "Remax Level", "Agent Phone 1", "Agent Phone 2", "Agent Mobile", "Street Address", "Address Locality", "Address Country", "Email", "Website"])
df.to_csv("data.csv", index=False, encoding = 'utf-8')
driver.close()

Tags: textnamedividbyaddressdriverphone
2条回答

嗯,我看到你只是在呼叫主url,就这样?如果您甚至还没有收集到主url中的urls,然后调用每个url来解析它,您将如何进行解析呢

即使你正在使用selenium来完成这样的任务,但这完全会减慢你的工作速度。因此,您必须阅读selenium documentation以了解selenium的使用方式

您尚未包含所需输出的任何sample。还有一些我无法理解的事情,比如level。无论如何,因为你没有帮助提供明确的信息

下面的代码应该可以实现您的目标:

import requests
from bs4 import BeautifulSoup


def First():
    r = requests.get("https://global.remax.com/handlers/officeagentsearch.ashx?mode=list&type=2&regionId=1000&regionRowId=&provinceId=&cityId=&localzoneId=&name=&location=&spokenLanguageCode=&page=1&countryCode=US&countryEnuName=USA&countryName=USA&selmode=residential&officeId=&TargetLng=&TargetLat=")
    soup = BeautifulSoup(r.text, 'html.parser')
    data = []
    for href in soup.find_all("a", class_="agent-name"):
        href = href.get("href"), href.text
        data.append(href)
    return data


def Second():
    for url, name in First():
        print(f"Extracting {name}")
        print('*' * 40)
        with requests.Session() as req:
            r = req.get(url)
            soup = BeautifulSoup(r.text, 'html.parser')
            phone = [item.get_text(strip=True) for item in soup.findAll(
                "span", {'id': ['AgentDirectDialSpan', 'OfficePhoneSpan', 'PhoneSpan']})]
            print(phone)
            addr = [item.get_text(strip=True, separator=" ") for item in soup.findAll(
                "span", id="ctl05_OfficeAddress")]
            print(addr)
            emailandurl = soup.find("a", {'class': 'url'})
            email = emailandurl.text
            url = emailandurl.get("href")
            if not "@" in email:
                email = "N/A"
            if "@" in url:
                url = "N/A"
            print(f"Email : {email}, Url: {url}")

        print('*' * 40)


Second()

输出:

Extracting Jim & Lisa - THE COOPERS
****************************************
['+1 816-260-1459', '+1 816-781-9080', '+1 816-260-8592']
['2 Victory Dr Liberty, Missouri, United States 64068']
Email : N/A, Url: http://soldbythecoopers.com
****************************************
Extracting Jim & Jimmie Rucker - The Rucker Group
****************************************
['+1 816-739-5289', '+1 816-781-9080', '+1 816-739-5289']
['2 Victory Dr Liberty, Missouri, United States 64068']
Email : N/A, Url: http://jimmierucker.remax-midstates.com
****************************************
Extracting The Steve & Shauna Faught Team !
****************************************
['+1 8053829441', '+1 (805) 208-1826']
['1151 S Victoria Ave Oxnard, California, United States 93035']
Email : shaunafaught@remax.net, Url: N/A
****************************************
Extracting Vickie Soupos & Georgia Colovos
****************************************
['+1 847.352.5200', '+1 847-352-5200', '+1 630.965.6000']
['1080 Nerge Rd. Suite 204 Elk Grove Village, Illinois, United States 60007']
Email : N/A, Url: http://vickiecsoupos.engagereagent.com
****************************************
Extracting Steven Roque & Jan Meyer
****************************************
['+1 (858) 451-6541', '+1 8583915800', '+1 (858) 451-6541']
['16840 Bernardo Center Dr. San Diego, California, United States 92128']
Email : stevenproque@remax.net, Url: N/A
****************************************
Extracting Gabrielle (clark) Lawson
****************************************
['+1 (937) 778-3961', '+1 9377783961', '+1 (937) 418-1718']
['1200 Park Ave Piqua, Ohio, United States 45356']
Email : brandi.clark@remax.net, Url: N/A
****************************************
Extracting The Beauchamp Team (Pam & Heather)
****************************************
['+1 3867588900', '+1 (386) 303-2505']
['4255 SW Cambridge Glen Lake City, Florida, United States 32024-3431']
Email : pamelabeauchamp@remax.net, Url: N/A
****************************************
Extracting Amanda (Ritter) Lease
****************************************
['+1 6128125732', '+1 (952) 475-8000', '+1 6128125732']
['125 Lake St West Wayzata, Minnesota, United States 55391']
Email : N/A, Url: http://amandalease.results.net
****************************************
Extracting The Manring Brothers @ REMAX
****************************************
['+1 (239) 289-6913', '+1 2397932777', '+1 (239) 289-6915']
['877 91st Ave N Suite 2 Naples, Florida, United States 34108']
Email : tylermanring@remax.net, Url: N/A
****************************************
Extracting Logan M. Aal
****************************************
['+1 (303) 456-2153', '+1 3034205352', '+1 (303) 501-0294']
['5440 Ward Rd Ste 110 Arvada, Colorado, United States 80002']
Email : loganaal@remax.net, Url: N/A
****************************************
Extracting Abe Aalami
****************************************
['+1 (425) 743-1639', '+1 2063225700', '+1 (206) 948-6283']
['2312 Eastlake Ave E Seattle, Washington, United States 98102']
Email : abe.aalami@remax.net, Url: N/A
****************************************
Extracting Lacey Aalderks
****************************************
['+1 3202311221', '+1 (320) 266-1631']
['770 N Business Hwy 71 Willmar, Minnesota, United States 56201']
Email : N/A, Url: http://www.laceyaalderks.com
****************************************
Extracting Andrea Aana
****************************************
['+1 (808) 935-8300', '+1 8089359800', '+1 (808) 937-6396']
['88 Kanoelehua Ave #A-105 Hilo, Hawaii, United States 96720']
Email : andreaaana@remax.net, Url: N/A
****************************************
Extracting Adam Aaron
****************************************
['+1 3039854555', '+1 (225) 571-5111']
['143 Union Blvd Ste 120 Lakewood, Colorado, United States 80228-1827']
Email : Adam.Aaron@remax.net, Url: N/A
****************************************
Extracting Bryan Aaron
****************************************
['+1 2812456463', '+1 (832) 526-1973']
['203 S Friendswood Dr Ste 200 Friendswood, Texas, United States 77546-3901']
Email : Bryan.Aaron@remax.net, Url: N/A
****************************************
Extracting Lindsey Aaron
****************************************
['+1 (877) 407-2676']
['7101 Vista Drive West Des Moines, Iowa, United States 50266']
Email : N/A, Url: http://lindseyaaronrealestate.com
****************************************
Extracting Mary Aaron
****************************************
['+1 (214) 802-3954', '+1 9724628181']
['500 S Denton Tap Ste 110 Coppell, Texas, United States 75019']
Email : betha@remax.net, Url: N/A
****************************************
Extracting Noelle Aasen
****************************************
['+1 (317) 863-4088', '+1 3178497653', '+1 (317) 627-2120']
['5645 Castle Creek Parkway North Dr. Indianapolis, Indiana, United States 46250']
Email : N/A, Url: http://151354589.homesconnect.com
****************************************
Extracting Kristy Aasheim
****************************************
['+1 4068962200']
['517 S 24th St W Ste A Billings, Montana, United States 59102']
Email : kristyaasheim@remax.net, Url: N/A
****************************************
Extracting Kristy L. Aasheim
****************************************
['+1 (406) 480-9383', '+1 7015808116']
['115 2nd Ave W Williston, North Dakota, United States 58801-5918']
Email : kaasheim@remax.net, Url: N/A
****************************************

我不太确定您的问题是什么,因为我没有手动测试您的代码,但是假设您的元素具有适当的XPath和id,我猜您正在尝试从列表对象(web元素列表)获取.text属性。因此,您需要向每个元素添加.text属性。例如,如果xpath在

name = driver.find_element_by_xpath('''//*[@id="MainContent"]/div[1]/div[2]/div/div[1]/div[1]/div[1]/div[1]/h2/a''') agent_name.append(name.text)

在页面上查找“Joe Smith、Bob Jones等”的所有name元素。如果要添加循环,则可以将.text属性添加到每个元素。例如:

names = driver.find_element_by_xpath('''//*[@id="MainContent"]/div[1]/div[2]/div/div[1]/div[1]/div[1]/div[1]/h2/a''') for name in names: agent_name.append(name.text)

这至少应该填充您的列表。如果这不起作用,我将再次检查您试图获取的内容是否确实是html中的文本属性(即不是图像),并确保您的元素标识符是正确的,并且您遵循python selenium文档中的建议/语法

相关问题 更多 >