使用Beautifulsoup从<script>中提取信息!python

2024-05-13 17:43:26 发布

您现在位置:Python中文网/ 问答频道 /正文

我是新来的,希望你能帮忙。我正在学习Python一段时间,目前正在练习web抓取

我尝试了beautifulsoup,直到我被这个脚本击中为止。我使用了一个美化网站,使其可读性

<div class="uq-hide js_variationsJSON">
            </div>  
    <div class="pdpStage__column pdp__details js_pdp-details">
         <script>
          var pdpVariationsJSON = 
        
        {
            "color-COL05|size-SMA002": {
                "id": "419994COL05SMA002000",
                "attributes": { "color": "Gray", "size": "XS" },
                "availability": { "status": "IN_STOCK", "statusQuantity": "0", "inStock": true, "ats": "19", "inStockDate": "", "availableForSale": true, "purchaseLevel": "", "levels": { "IN_STOCK": 1, "PREORDER": 0, "BACKORDER": 0, "NOT_AVAILABLE": 0 }, "isAvailable": true, "inStockMsg": "1 Item(s) In Stock", "preOrderMsg": "0 item(s) are available for pre-order.", "backOrderMsg": "Back Order 0 item(s)" },
                "pricing": {
                    "showStandardPrice": false,
                    "isPromoPrice": false,
                    "standard": 59.9,
                    "formattedStandard": "£59.90",
                    "sale": 59.9,
                    "formattedSale": "£59.90",
                    "salePriceMoney": {},
                    "standardPriceMoney": {},
                    "pricePercentage": "",
                    "quantities": [
                        { "unit": "", "value": 0 }
                        ]
                },
                "applicablebadges": [
                    { "id": "extendedSize", "value": "global.badge.extrasize", "coValue": "XXS-3XL", "class": "grey" }
                    ]
            },
            "color-COL05|size-SMA003": {
                "id": "419994COL05SMA003000",
                "attributes": { "color": "Gray", "size": "S" },
                "availability": { "status": "IN_STOCK", "statusQuantity": "0", "inStock": true, "ats": "24", "inStockDate": "", "availableForSale": true, "purchaseLevel": "", "levels": { "IN_STOCK": 1, "PREORDER": 0, "BACKORDER": 0, "NOT_AVAILABLE": 0 }, "isAvailable": true, "inStockMsg": "1 Item(s) In Stock", "preOrderMsg": "0 item(s) are available for pre-order.", "backOrderMsg": "Back Order 0 item(s)" },
                "pricing": {
                    "showStandardPrice": false,
                    "isPromoPrice": false,
                    "standard": 59.9,
                    "formattedStandard": "£59.90",
                    "sale": 59.9,
                    "formattedSale": "£59.90",
                    "salePriceMoney": {},
                    "standardPriceMoney": {},
                    "pricePercentage": "",
                    "quantities": [
                        { "unit": "", "value": 0 }
                        ]
                },
                "applicablebadges": [
                    { "id": "extendedSize", "value": "global.badge.extrasize", "coValue": "XXS-3XL", "class": "grey" }
                    ]
            },

每次我都得到一个结果,没有一个或一个完整的代码,但没有这一点。我不知道为什么

我这样称呼它

req = Request(link, headers={'User-Agent': 'Mozilla/5.0'})
link= mylink
w = urlopen(req).read()
soup = BeautifulSoup(w, "html.parser")
new = soup.find('div',{'class':'pdpGrid pdp__module pdpStage js_pdpGrid js_pdpGrid-    no-quickview'})
print(new)

如果您有任何建议,我将不胜感激

以下是该网站的链接示例: 'https://www.uniqlo.com/uk/en/product/men-ultra-light-down-jacket-419994COL09SMA005000.html"


Tags: individfalsetruesizevalue网站
2条回答

它可能是事后由javascript加载的。正如Aero Blue提到的,您可以查找它最初来自的位置,或者如果您可以这样做,您可以尝试Selenium。Selenium帮助处理运行javascript的网页

没有网站,因此很难确切知道发生了什么,但我可以想象,您看到的脚本标记是在加载页面时通过请求生成的

由于javascript不是通过请求页面来执行的,因此它不会显示

我建议您查看浏览器调试器菜单中的“网络”选项卡,尝试跟踪它的来源

然后需要向该页面发出请求,并可能解析JSON响应

编辑:所以基本上我要在这里跳一跳,说:祝你好运,但这几乎是不可能的。首先,一切都是通过javascript动态生成的,其次,他们使用akamai(反抓取技术)和浏览器指纹。基本上,试图刮取一些东西就像是在一个无形的迷宫中穿针引线,迷宫中有几十个键和参数

如果你能访问的话,我能给你的最多就是这个信息

相关问题 更多 >