用Python从动态刷新JavaScript页面抓取数据
我想从网站 https://bloks.io/live 上抓取一些数据。第一个问题是,我没有办法访问那个表格中不断更新的数据。我的想法是检查第一列是否有变化。如果有变化,我就需要查看是哪个账户等等……所以我需要不断读取第一列,但这并没有成功。
我尝试过用CSS选择器,但没有成功。
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from webdriver_manager.chrome import ChromeDriverManager
from bs4 import BeautifulSoup
link = 'https://bloks.io/live'
driver = webdriver.Chrome(service=Service((ChromeDriverManager().install())))
driver.get(link)
page =driver.page_source
tSoup = BeautifulSoup(driver.find_element(By.CSS_SELECTOR, '#info>tbody>tr:nth-child(2)').get_attribute('outerHTML'), 'html.parser')
这让我收到了一个“没有这样的元素”的错误信息。有没有人能帮我一下?
更新:我的代码找到了我想要不断检查的元素,但这个元素并没有更新。到目前为止我的代码是:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
from webdriver_manager.chrome import ChromeDriverManager
from bs4 import BeautifulSoup
import time
def connection():
driver = webdriver.Chrome(service=Service((ChromeDriverManager().install())))
return driver
def search_first_entry(driver, link):
driver.get(link)
# wait until page has buffered
wait = WebDriverWait(driver, 5)
element_prev = 0
element = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, '#info>tbody>tr:nth-child(2)')))
while True:
element = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, '#info>tbody>tr:nth-child(2)')))
print(element)
time.sleep(1)
if element != element_prev:
element_prev = element
def main():
link = 'https://bloks.io/live'
driver = connection()
search_first_entry(driver, link)
main()
1 个回答
0
为了更好地理解发生了什么,我建议你截个屏并保存网页的HTML内容。这样做的话,你会发现你需要等一段时间(或者使用“睡眠”这个方法,但这并不是最好的解决办法)。
# save screen
driver.save_screenshot("webpage.png")
# save html
html_code = driver.page_source
with open("webpage.html", "w", encoding="utf-8") as file:
file.write(html_code)
关于等待和睡眠的更多信息,可以查看这个链接:https://selenium-python.readthedocs.io/waits.html