问题1:如何单击具有相同类(单独页面)的下一个div
元素,刮取页面,返回并单击下一个div
元素?由于所有元素都具有相同的类名,并且包含一个指向我要刮除的单独页面的唯一链接,因此问题变成我需要查找元素->;转到页面->;刮除信息->;返回->;转到下一个元素,依此类推。已解决:
问题2:如何打印到CSV打印文本而不是xpath路径。参见下面使用的代码:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.common.exceptions import NoSuchElementException
from webdriver_manager.chrome import ChromeDriverManager
driver.get('https://www.example.com/list/')
# Loop load more in list
loadmore = True
while loadmore:
try:
next_link = driver.find_element_by_xpath('//button[@id="show-more"]')
next_link.click()
except NoSuchElementException:
rows_remaining = False
# Open Csv file
File = open('list.csv', 'w')
writer = csv.writer(File)
entries = [] # defines entries
writer.writerows((entries))
# Collect all the elements
elements = driver.find_elements_by_css_selector("a[href*='/stockholm/']")
# Loop through each element to scrape
urls=[]
for element in elements:
urls.append(element.get_attribute('href'))
print(element.get_attribute('href')) # gets the href value of the element
# Define Xpath
def get_elements_by_xpath(driver, xpath):
return [entry.text for entry in driver.find_elements_by_xpath(xpath)]
for url in urls:
driver.get(url)
facts = [
("//div[@class='fact' and contains(span, '')][1]"),
("//div[@class='fact' and contains(span, '')][2]"),]
for name, xpath in facts:
entries.append(get_elements_by_xpath(driver, xpath))
writer.writerow(facts)
这是用于在单个页面上打印到CSV而不循环URL的代码:
facts = [
("//div[@class='fact' and contains(span, '')][1]"),
("//div[@class='fact' and contains(span, '')][2]"),]
with open('list.csv', 'a') as f:
writer = csv.writer(f)
entries = []
for name, xpath in facts:
entries.append(get_elements_by_xpath(driver, xpath))
writer.writerows(zip(*entries))
你不必使用下面的兄弟姐妹。您可以使用返回列表的find\元素来查找所有div。在此之后,您可以通过每个元素循环,并刮取您需要的内容。你知道吗
相关问题 更多 >
编程相关推荐