从span tags issu获取文本

2024-04-25 05:51:08 发布

您现在位置:Python中文网/ 问答频道 /正文

this链接中,我想从r_compare_bars_value类中的span标记获取文本。如果你搜索那个类,你会看到文本为104 (min: 88) fps,我只想取min:88部分。我的代码

from bs4 import BeautifulSoup
import urllib.request,requests
r = urllib.request.urlopen('http://www.notebookcheck.net/Computer-Games-on-Laptop-Graphics-Cards.13849.0.html').read()
soup = BeautifulSoup(r)

links = [a['href'] for a in soup.select(".gpugames_header_games > a")]

for url in links:
    if url != "":
        print (url)
        rr = requests.get(url).content
        soup = BeautifulSoup(rr,"html.parser")

        for aa in soup.select("div.r_compare_bars_value span"):
            print (aa)
            if "min:" in aa.text:
                print (aa.text)

但它现在什么也不打印,在其他类上打印了大量字符串,而不是min:88部分。我也试过div.tx-nbc2fe-pi1,也试过不带span标签。那个网站上的密码真是乱七八糟。我的错误在哪里?我该如何纠正?你知道吗


Tags: in文本importurlforvalueurllibmin
1条回答
网友
1楼 · 发布于 2024-04-25 05:51:08

如果不处理通过拆分、剥离等返回的文本,就没有办法做到这一点。。r\u compare\u bars\u value实际上也在span而不是div中,因此soup.select("span.r_compare_bars_value")是正确的选择器。你知道吗

这实际上是一个很好的正则表达式用例:

from bs4 import BeautifulSoup
import requests
import re
mn = re.compile("\(min:.*?\)")

r = requests.get('http://www.notebookcheck.net/Computer-Games-on-Laptop-Graphics-Cards.13849.0.html').content
soup = BeautifulSoup(r, "lxml")

links = (a["href"] for a in soup.select(".gpugames_header_games > a"))


for url in links:
    if url:
        rr = requests.get(url).content
        soup = BeautifulSoup(rr, "html.parser")
        for aa in soup.select("span.r_compare_bars_value"):
            m = mn.search(aa.text)
            if m:
                print(m.group())

在几个URL上运行上面的命令可以:

(min: 88)
(min: 164)
(min: 251)
(min: 281)
(min: 283)
(min: 291)
(min: 75)
(min: 129)
(min: 202)
(min: 64)
(min: 94)
(min: 178)
(min: 53)
(min: 97)
(min: 154)
(min: 199)
(min: 289)
(min: 296)
(min: 55)
(min: 78)
(min: 39)
(min: 57)
(min: 109)
(min: 153)
(min: 200)
(min: 216)
(min: 39)
(min: 59)
(min: 110)

相关问题 更多 >