如何获得下一个元素与以下兄弟?

2024-04-25 01:13:52 发布

您现在位置:Python中文网/ 问答频道 /正文

问题1:如何单击具有相同类(单独页面)的下一个div元素,刮取页面,返回并单击下一个div元素?由于所有元素都具有相同的类名,并且包含一个指向我要刮除的单独页面的唯一链接,因此问题变成我需要查找元素->;转到页面->;刮除信息->;返回->;转到下一个元素,依此类推。已解决:

问题2:如何打印到CSV打印文本而不是xpath路径。参见下面使用的代码:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.common.exceptions import NoSuchElementException
from webdriver_manager.chrome import ChromeDriverManager

driver.get('https://www.example.com/list/')

# Loop load more in list
loadmore = True

while loadmore:
    try:
        next_link = driver.find_element_by_xpath('//button[@id="show-more"]')
        next_link.click()
    except NoSuchElementException:
        rows_remaining = False

# Open Csv file
File = open('list.csv', 'w')
writer = csv.writer(File)
entries = [] # defines entries
writer.writerows((entries))

# Collect all the elements
elements = driver.find_elements_by_css_selector("a[href*='/stockholm/']")

# Loop through each element to scrape
urls=[]
for element in elements:
    urls.append(element.get_attribute('href'))
    print(element.get_attribute('href')) # gets the href value of the element

# Define Xpath
def get_elements_by_xpath(driver, xpath):
    return [entry.text for entry in driver.find_elements_by_xpath(xpath)]

for url in urls:
    driver.get(url)
    facts = [
    ("//div[@class='fact' and contains(span, '')][1]"),
    ("//div[@class='fact' and contains(span, '')][2]"),]
    for name, xpath in facts:
        entries.append(get_elements_by_xpath(driver, xpath))
    writer.writerow(facts)

这是用于在单个页面上打印到CSV而不循环URL的代码:

facts = [
    ("//div[@class='fact' and contains(span, '')][1]"),
    ("//div[@class='fact' and contains(span, '')][2]"),]


with open('list.csv', 'a') as f:
    writer = csv.writer(f)
    entries = []
    for name, xpath in facts:
        entries.append(get_elements_by_xpath(driver, xpath))
    writer.writerows(zip(*entries))

Tags: ingtdiv元素forgetbydriver
1条回答
网友
1楼 · 发布于 2024-04-25 01:13:52

你不必使用下面的兄弟姐妹。您可以使用返回列表的find\元素来查找所有div。在此之后,您可以通过每个元素循环,并刮取您需要的内容。你知道吗

    # Collect all the elements
elements = driver.find_elements_by_css_selector("a[href*='/stockholm/']")

# Loop through each element to scrape
urls=[]
for element in elements:
    urls.append(element.get_attribute('href'))
    print(element.get_attribute('href')) # gets the href value of the element

# Define Xpath
def get_elements_by_xpath(driver, xpath):
    return [entry.text for entry in driver.find_elements_by_xpath(xpath)]

# Open Csv file
File = open('list.csv', 'w')
writer = csv.writer(File)

for url in urls:
    print(url) # Check if the url is correct
    driver.get(url)
    entries = [] # defines entries - Reset to blank after each loop
    facts = [
    ("//div[@class='fact' and contains(span, '')][1]"),
    ("//div[@class='fact' and contains(span, '')][2]"),]
    for xpath in facts:
        entries.append(get_elements_by_xpath(driver, xpath))
    print(entries) #Check what you are writing into csv file before writing
    writer.writerow(entries)

相关问题 更多 >