raise JSONDecodeError（“预期值”，s，err.value）

import json from selenium import webdriver import pandas as pd from bs4 import BeautifulSoup from datetime import datetime start_time = datetime.now() data = [] op = webdriver.ChromeOptions() op.add_argument('--ignore-certificate-errors') op.add_argument('--incognito') op.add_argument('--headless') driver = webdriver.Chrome(executable_path='D:/Desktop/Query/chromedriver.exe',options=op) driver.get('https://www.cdiscount.com/f-1175520-MIS2008813786478.html') link = 'https://www.cdiscount.com/f-1175520-MIS2008813786478.html' soup = BeautifulSoup(driver.page_source, 'html.parser') b = soup.prettify() product_title = soup.find('title').getText() reviews = soup.find_all("script",type="application/ld+json") for element in reviews : json_string = element.getText() json_dict = json.loads(json_string) data.append(json_dict)

1条回答

网友

1楼 · 发布于 2024-04-20 01:03:27

您可以通过访问元素的contents来尝试读取JSON

for element in reviews: 
     json_string = ' '.join(element.contents)
     json_dict = json.loads(json_string)
     data.append(json_dict)

关于{}的美丽组合{a1}：

If you only want the human-readable text inside a document or tag, you can use the get_text() method.
...
As of Beautiful Soup version 4.9.0, when lxml or html.parser are in use, the contents of , , and tags are not considered to be ‘text’, since those tags are not part of the human-visible content of the pag*

这就是为什么在您的案例中getText返回一个空字符串，并且需要使用contents

相关问题更多 >

编程相关推荐

热门问题

热门文章