Rpi 0上的Web刮板每34分钟只抓取一次新数据？

from urllib.request import urlopen as uReq from bs4 import BeautifulSoup as soup # Set the URL you want to webscrape from url = 'https://www.mlb.com/astros/scores' while again: # Connect to the URL uClient = uReq(url) page_html = uClient.read() uClient.close() #set html parsing page_soup = soup(page_html,"html.parser") data = page_soup.find('div',{'data-test-mlb':'singleGameContainer'})

1条回答

网友

1楼 · 发布于 2024-05-14 12:36:13

beautifulsoup从HTML中提取信息，但它本身并不执行请求。如果您已在驱动器上以html格式保存了网页，并且始终使用beautifulsoup对其进行解析，则该网页将永远不会更新。您必须使用requests.get或同等工具再次获取您的网页

例如：

import requests 
from bs4 import BeautifulSoup 
from time import sleep, time 

prev = "" 
# Set the URL you want to webscrape from 
url = 'https://www.mlb.com/astros/scores' 
start = time() 
while True: 
    t0 = time() 
    # Connect to the URL 
    r = requests.get(url) 
    page_html = r.text 

    t1 = time() 
    print(f"{t1 - start:.2f}s {t1-t0:.2f}s", page_html == prev) 
    prev = page_html 
    sleep(10)

上面的代码提供了以下输出：

0.15s 0.15s False
10.38s 0.22s True
20.56s 0.17s True
32.41s 1.83s True
42.57s 0.16s True
52.74s 0.16s True
62.90s 0.15s True
73.08s 0.17s True
83.25s 0.16s True
93.41s 0.15s True
103.57s 0.15s True
115.13s 1.55s False
125.29s 0.16s True
135.46s 0.16s True
145.63s 0.16s True
155.81s 0.17s True
166.07s 0.26s True

因此网页正在正确更新

但是，有一件事可能是错误的根源，那就是使用BeautifulSoup.findhere，这会将输出限制为仅一个结果。我猜这是故意的，但如果不是，你可能有你的问题就在那里

相关问题更多 >

编程相关推荐

热门问题

热门文章