我正在尝试对给定地址的trulia估计值进行webscrape。尽管有些地址没有trulia估计值。因此,我想首先尝试找到文本“Trulia estimate”,如果找到了,那么我将尝试找到值。目前,我不知道如何找到Trulia估算文本,如下所示:
以下是我目前掌握的代码:
from selenium import webdriver
from selenium.webdriver.remote import webelement
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
import time
from bs4 import BeautifulSoup
import os
from datetime import datetime
from selenium.webdriver import ActionChains
driver = webdriver.Firefox(executable_path = 'C:\\Users\\Downloads\\geckodriver-v0.24.0-win64\\geckodriver.exe')
def get_trulia_estimate(address):
driver.get('https://www.trulia.com/')
print(address)
element = (By.ID, 'homepageSearchBoxTextInput')
WebDriverWait(driver, 10).until(EC.element_to_be_clickable(element)).click()
WebDriverWait(driver, 10).until(EC.element_to_be_clickable(element)).send_keys(address)
search_button = (By.CSS_SELECTOR, "button[data-auto-test-id='searchButton']")
WebDriverWait(driver, 50).until(EC.element_to_be_clickable(search_button)).click()
time.sleep(3)
soup = BeautifulSoup(driver.page_source, 'html.parser')
results = soup.find('div', {'class', 'Text__TextBase-sc-1cait9d-0 OmRik'})
print(results)
get_trulia_estimate('693 Bluebird Canyon Drive, Laguna Beach, CA 92651')
如有任何建议,我们将不胜感激
使用
beautifulsoup
的版本:印刷品:
CSS选择器
h3:has(+div span:contains("Trulia Estimate"))
发现<h3>
的标签<div>
包含<span>
和字符串“Trulia Estimate”作为直接同级进一步阅读:
CSS Selectors Reference
看起来每次都会生成CSS
我建议对此使用XPATH
使用
.text
获取文本您可能希望更改为具有价格的父元素。。。因此,使用
(//div[@aria-label="Price trends are based on the Trulia Estimate"])[1]//../h3/div
作为xpath如果使用价格的xpath,则输出为:
如果你想尝试不带美女的生活
相关问题 更多 >
编程相关推荐