使用bs4进行网页抓取时不返回数值

2024-04-29 13:10:49 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图从中获取的页面是https://www.investagrams.com/Stock/ac,我试图获取价格值(779.00),但我的代码只返回:{{ViewStockPage.Data.Stock.LatestStockHistory.Last | numberPriceFormat}}

我的代码:

from bs4 import BeautifulSoup
import requests

r = requests.get('https://www.investagrams.com/Stock/ac')
soup = BeautifulSoup(r.text, "lxml")

main = soup.find('div', class_= 'd-flex flex-row justify-content-between')

header = main.find('h4', class_= 'mb-0')

price = header.find('span', class_= 'mr-2').string

print(price)

网站HTML:

<h4 class="mb-0"> 
<small class="ng-binding">Ayala Corporation (PSE:AC) </small> <br>
<strong> 
 <span class="mr-2 ng-binding" data-ng-class="ViewStockPage.Data.Stock.LatestStockHistory.LastClass">779.00 </span> 
 <span data-ng-class="{'stock-up-caret stockprice-up' : ViewStockPage.Data.Stock.LatestStockHistory.ChangePercentage > 0, 'stock-down-caret stockprice-down' : ViewStockPage.Data.Stock.LatestStockHistory.ChangePercentage < 0, 'glyphicon glyphicon-minus stockprice-flat': ViewStockPage.Data.Stock.LatestStockHistory.ChangePercentage == 0}" class="stock-down-caret stockprice-down" style=""> </span> 
 <span style="font-size: 13px; vertical-align: middle;" data-ng-class="{'stockprice-up' : ViewStockPage.Data.Stock.LatestStockHistory.ChangePercentage > 0, 'stockprice-down' : ViewStockPage.Data.Stock.LatestStockHistory.ChangePercentage < 0, 'stockprice-flat': ViewStockPage.Data.Stock.LatestStockHistory.ChangePercentage == 0}" class="stockprice-down"> 
   <span class="ng-binding">-21.00 </span> 
   <span class="ml-1 ng-binding">-2.62% </span> 
 </span> 
</strong> 
</h4>

Tags: datastockfindngh4classdownbinding
2条回答

您试图从中获取的页面正在使用JavaScript异步填充DOM。您可以期望BeautifulSoup不适用于这样的页面,因为BeautifulSoup只能看到在服务器向您提供文档时直接烘焙到HTML中的内容

如果在浏览器中查看页面并记录网络流量,您将看到对各种REST API端点发出的多个请求,其中一个端点/InvestaApi/Stock/ViewStock,并将“股票代码”作为查询字符串参数。该端点的响应是JSON,包含您试图获取的信息。您只需模拟HTTP GET请求:

def get_price(stock_code):
    import requests

    url = "https://webapi.investagrams.com/InvestaApi/Stock/ViewStock"

    params = {
        "stockCode": stock_code,
        "defaultExchangeType": "1",
        "cv": "1622292000-0-"
    }

    headers = {
        "accept": "application/json",
        "accept-encoding": "gzip, deflate",
        "referer": "https://www.investagrams.com/",
        "user-agent": "Mozilla/5.0"
    }

    response = requests.get(url, params=params, headers=headers)
    response.raise_for_status()

    return response.json()["LatestStockHistory"]["Last"]

def main():

    print(get_price("ac"))
    
    return 0


if __name__ == "__main__":
    import sys
    sys.exit(main())

输出:

779
>>> 

此页面使用JavaScript{{...}}位置添加值,但requestsBeautifulsoup无法运行JavaScript。您可能需要Selenium来控制可以运行JavaScript的真实web浏览器


使用Firefox/Chrome(tab:Network,filter:XHR)中的DevTools,我发现JavaScript

https://webapi.investagrams.com/InvestaApi/Stock/ViewStock?stockCode=ac&defaultExchangeType=1&cv=1622292000-0-

使用带有一些标题的requests我也可以得到它。
因为它以JSON的形式获取数据,所以我不需要BeautifulSoup进行此操作

import requests

headers = { 
    'User-Agent': 'Mozilla/5.0',
    'Referer': 'https://www.investagrams.com/'
}

url = 'https://webapi.investagrams.com/InvestaApi/Stock/ViewStock?stockCode=ac&defaultExchangeType=1&cv=1622292000-0-'
r = requests.get(url, headers=headers)

#print(r.status_code)
#print(r.json())

data = r.json()
print('Last:', data['LatestStockHistory']['Last'])

for key, value in data['LatestStockHistory'].items():
    print(key, '=', value)

结果:

Last: 779

StockId = 79
Date = 2021-05-28T00:00:00+08:00
DateShortString = 05/28/2021
DateTimeString = May 28, 2021 12:00:00 AM
Last = 779
LastString = 779.00
Open = 780
Close = 800
Change = -21
ChangeString = -21.00
ChangePercentage = -2.62
ChangePercentageString = -2.62%
Low = 768
High = 789.5
Average = 778.14
Volume = 457590
Value = 356067865
Trades = 4227
MarketCap = 482.74B
NetForeign = -3091405
LastUpdateTime = 2021-05-28T15:30:00
LastUpdateTimeString = May 28, 2021 03:30:00 PM

相关问题 更多 >