刮掉雅虎股票新闻

from bs4 import BeautifulSoup import re from selenium import webdriver import chromedriver_binary import string import time from selenium.webdriver.common.keys import Keys from selenium.webdriver.common.by import By driver = webdriver.Chrome("/Users/abhishekgupta/Downloads/chromedriver") driver.get("https://finance.yahoo.com/quote/INFY/news?p=INFY") for i in range(20): # adjust integer value for need # you can change right side number for scroll convenience or destination driver.execute_script("window.scrollBy(0, 250)") # you can change time integer to float or remove time.sleep(1) print(driver.find_element_by_xpath('//*[@id="latestQuoteNewsStream-0-Stream"]/ul/li[9]/div/div/div[2]/h3/a/text()').text())

3条回答

网友

1楼 · 编辑于 2024-06-12 10:58:39

我认为您的代码很好，只有一件事：当我们在selenium中使用xpath检索文本或链接时，与scrapy相比，或者如果您使用的是lxml fromstring library，那么这里有一些应该适合您的东西

#use this code for printing instead 
print(driver.find_element_by_xpath('//*[@id="latestQuoteNewsStream-0- Stream"]/ul/li[9]/div/div/div[2]/h3/a').text)

即使这样做，它也会以同样的方式工作，因为只有一个元素具有此id，所以只需使用

#This should also work fine
print(driver.find_element_by_xpath('//*[@id="latestQuoteNewsStream-0- Stream"]').text)

网友

2楼 · 编辑于 2024-06-12 10:58:39

可以使用//而不是/div/div/div[2]使用不太详细的xpath

若您想要最后一项，那个么获取所有li作为列表，然后使用[-1]获取列表上的最后一个元素

from selenium import webdriver
import time

driver = webdriver.Chrome("/Users/abhishekgupta/Downloads/chromedriver")
#driver = webdriver.Firefox()

driver.get("https://finance.yahoo.com/quote/INFY/news?p=INFY")

for i in range(20):
       driver.execute_script("window.scrollBy(0, 250)")
       time.sleep(1)

all_items = driver.find_elements_by_xpath('//*[@id="latestQuoteNewsStream-0-Stream"]/ul/li')

#for item in all_items:
#    print(item.find_element_by_xpath('.//h3/a').text)
#    print(item.find_element_by_xpath('.//p').text)
#    print(' -')
    
print(all_items[-1].find_element_by_xpath('.//h3/a').text)
print(all_items[-1].find_element_by_xpath('.//p').text)

网友

3楼 · 编辑于 2024-06-12 10:58:39

页面中不存在您提供的xPath

下载xPath FinderChrome扩展以查找文章的正确xPath

下面是文章列表的xPath示例，您需要通过id进行循环：

/html/body/div[1]/div/div/div[1]/div/div[3]/div[1]/div/div[5]/div/div/div/ul/li[ID]/div/div/div[2]/h3/a/u

相关问题更多 >

编程相关推荐

热门问题

热门文章