（Python）尝试在网页上使用beautifulsoup进行解析，该网页在初始加载后会更新

content_wrapper = soup.find('div', class_='col2 gridCell StoreAvail editable anchored', id='StoreAvail_7') cheese = content_wrapper.find('div', class_='sublist instore_inventory_section nodisplay', id='WC_InStore_Inventory_Section_3074457345618960372') print(cheese)

2条回答

网友

1楼 · 编辑于 2024-05-15 21:53:37

您正在爬网的站点不是在服务器端呈现的，而是在客户端呈现的，可能带有一些Javascript库/框架，如React.js或Angular

如果你想浏览这样的网站，你需要使用无头浏览器。最流行的无头浏览器也是Puppeteer and there is a port for Python

Puppeter启动一个真正的chromium实例，从而解析/呈现站点上所有Javascript驱动的内容。显然，这需要更长的时间

网友

2楼 · 编辑于 2024-05-15 21:53:37

有关股票信息的数据是从不同的URL加载的。您可以使用此示例打印库存状态、数量等：

import re
import json
import requests
from bs4 import BeautifulSoup


headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:80.0) Gecko/20100101 Firefox/80.0', 'X-Requested-With':'XMLHttpRequest',
'Accept-Language': 'en-US,en;q=0.5', 'Referer': 'https://www.basspro.com/shop/en/herters-hunting-rifle-ammo/'}

url = 'https://www.basspro.com/shop/en/herters-hunting-rifle-ammo/'
in_stock_url = 'https://www.basspro.com/shop/BPSGetOnlineInventoryStatusByIDView'

html_text = requests.get(url, headers=headers).text
soup = BeautifulSoup(html_text, 'html.parser')

productId = soup.select_one('meta[name="pageId"]')['content']
storeId = re.search(r'"storeId"\s*:\s*\'([\d]+)\'', html_text).group(1)
catalogId = re.search(r'"catalogId"\s*:\s*\'([\d]+)\'', html_text).group(1)

# sometimes, the server returns error page, so repeat the loading untill success:
while True:
    try:
        json_txt = requests.post(in_stock_url, headers=headers, data={'productId': productId, 'storeId': storeId, 'catalogId': catalogId}).text
        data = json.loads( re.search(r'/\*(.*)\*/', json_txt, flags=re.S).group(1) )
        break
    except:
        pass

# uncomment this to print all data:
# print(json.dumps(data, indent=4))

for k in data['onlineInventory']:
    d = soup.select_one('#WC_Sku_List_Row_Content_' + k)
    if d:
        print(d.select_one('.CartridgeorGauge').get_text(strip=True))
        print(d.select_one('.ModelNumber').get_text(strip=True))
        print(data['onlineInventory'][k]['altText'])
        print(data['onlineInventory'][k]['quantity'])
        print('-' * 80)

印刷品：

.30-30 Winchester
HRT3030A
In-Stock
158
                                        
.30-06 Springfield
HRT3006C
In-Stock
16
                                        
.308 Winchester
HRT308D
Out of Stock
0
                                        
.300 AAC Blackout
HRT300BLK
Out of Stock
0
                                        
.22-250 Remington
HRT22250A
In-Stock
192
                                        
.223 Remington
HRT223B
Out of Stock
0
                                        
.223 Remington
HRT223150
Out of Stock
32
                                        

...and so on.

相关问题更多 >

编程相关推荐

热门问题

热门文章