我试图从这个website中获取一组房屋链接,但它只返回大约9个元素,尽管它有更多的元素。我也尝试过使用漂亮的汤,但同样的事情发生了,它并没有返回所有元素
含硒:
for i in range(10):
time.sleep(1)
scr1 = driver.find_element_by_xpath('//*[@id="search-page-list-container"]')
driver.execute_script("arguments[0].scrollTop = arguments[0].scrollHeight", scr1)
link_tags = driver.find_elements_by_css_selector(".list-card-info a")
links = [link.get_attribute("href") for link in link_tags]
pprint(links)
对于bs4:
headers = {
'accept-language': 'en-GB,en;q=0.8,en-US;q=0.6,ml;q=0.4',
'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko)'
'Chrome/74.0.3729.131 Safari/537.36'
}
response = requests.get(ZILLOW_URL, headers=headers)
website_content = response.text
soup = BeautifulSoup(website_content, "html.parser")
link_tags = soup.select(".list-card-info a")
link_list = [link.get("href") for link in link_tags]
pprint(link_list)
输出:
'https://www.zillow.com/b/407-fairmount-ave-oakland-ca-9NTzMK/',
'https://www.zillow.com/homedetails/1940-Buchanan-St-A-San-Francisco-CA-94115/15075413_zpid/',
'https://www.zillow.com/homedetails/2380-California-St-QZ6SFATJK-San-Francisco-CA-94115/2078197750_zpid/',
'https://www.zillow.com/homedetails/5687-Miles-Ave-Oakland-CA-94618/299065263_zpid/',
'https://www.zillow.com/b/olume-san-francisco-ca-65f3Yr/',
'https://www.zillow.com/homedetails/29-Balboa-St-APT-1-San-Francisco-CA-94118/2092859824_zpid/']
有没有办法解决这个问题?我真的很感激你的帮助
您必须在循环中逐个滚动到每个元素,然后必须查找具有
href
的后代锚标记导入:
输出:
问题是网站。它添加了动态链接,因此您可以尝试滚动到页面底部,然后搜索链接
相关问题 更多 >
编程相关推荐