在websi的指定部分中查找具有

2024-04-19 20:00:10 发布

男 | 程序猿一只，喜欢编程写python代码。

我想从网页中提取所有（在本例中是两个）hast标记。你知道吗

<html>
    <head>
    </head>
    <body>
        <div class="predefinition">
            <p class="part1">
              <span class="part1-head">Entries:</span>
                <a class="pr" href="/go_somewhere/">#hashA with space</a>, 
                <a class="pr" href="/go_somewhere/">#hashBwithoutsace</a>,
            </p>
            <span class="part2">Boundaries:</span>
            <p>some boundary statement</p>
        </div>        
        <div class="wrapper"> <!– I only want to search here–>
            <p class="part1">
              <span class="part1-head">Entries:</span>
                <a class="pr" href="/go_somewhere/">#hash1 with space</a>, <!– I only want to find this–>
                <a class="pr" href="/go_somewhere/">#hash2withoutsace</a>, <!– and this–>
            </p>
            <span class="part2">Boundaries:</span>
            <p>some other boundary statement</p>
        </div>        
    </body>
</html>

但我只对一个分支（在这个示例包装器中）中的哈希标记感兴趣：“#hash1 with space”和“#hash2withoutspace”。现在我的代码如下所示：

from bs4 import BeautifulSoup
import io
import re

f = io.open("minimal.html", mode="r", encoding="utf-8")
contents = f.read()
soup = BeautifulSoup(contents, 'lxml')
mydivs = soup.findAll("a", {"class": "pr"})

for div in mydivs:
    print(re.findall(r'(?i)\#\w+', str(div)))

如何将搜索集中在“wrapper”div上？你知道吗
以及如何将hashtags包含在空格中？你知道吗

Tags：标记 import div go html with body space

1条回答

网友

1楼 · 发布于 2024-04-19 20:00:10

您可以找到带有classpr的所有a标记的文本，然后选择最后两个：

from bs4 import BeautifulSoup as soup
results = [i.text for i in soup(content, 'html.parser').find('div', {'class':'wrapper'}).find_all('a', {'class':'pr'})]

输出：

['#hash1 with space', '#hash2withoutsace']

在websi的指定部分中查找具有

相关问题更多 >

编程相关推荐

热门问题

热门文章

在websi的指定部分中查找具有

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >