如何从HTML页面但从元素本身提取或刮取数据

 

2条回答

网友

1楼 · 编辑于 2024-05-13 21:00:35

请尝试以下脚本：

from bs4 import BeautifulSoup
import requests

BASE_URL = "https://webscraper.io/test-sites/e-commerce/allinone/computers/laptops"

html = requests.get(BASE_URL).text
soup = BeautifulSoup(html, "html.parser")
for tag in soup.find_all("div", {"class":"ratings"}):
    # get all child from the tags
    for h in tag.children:
        # convert to string data type
        s = h.encode('utf-8').decode("utf-8") 

        # find the tag with data-rating and get text after the keyword
        m = re.search('(?<=data-rating=)(.*)', s)

        # check if not None
        if m:
            #print the text after data-rating and remove last char
            print(m.group()[:-1])

网友

2楼 · 编辑于 2024-05-13 21:00:35

如果我正确理解您的问题和评论，以下内容应摘录该页面中的所有评分：

import lxml.html
import requests

BASE_URL = "https://webscraper.io/test-sites/e-commerce/allinone/computers/laptops"

html = requests.get(BASE_URL)
root = lxml.html.fromstring(html.text)
targets = root.xpath('//p[./span[@class]]/@data-rating')

例如：

targets[0]

输出

3

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何从HTML页面但从元素本身提取或刮取数据

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >