如何使用Beautiful Soup (bs4) 匹配唯一一个CSS类

6 投票

7 回答

19014 浏览

提问于 2025-04-17 13:30

我正在使用以下代码来匹配所有带有CSS类“ad_item”的div。

soup.find_all('div',class_="ad_item")

我遇到的问题是，在那个网页上，还有一些div的CSS类设置为“ad_ex_item”和“ad_ex_item”。

<div class="ad_item ad_ex_item">

当你搜索一个匹配特定CSS类的标签时，你实际上是在匹配它的所有CSS类：

那么，我该如何匹配只有“ad_item”的div，而不包括“ad_ex_item”呢？

换句话说，我该如何搜索仅具有CSS类“ad_item”的div？

数据解析网页抓取 html解析 beautiful soup css类选择器匹配 div选择器唯一匹配

7 个回答

你可以把一个lambda函数传给find和find_all这两个方法。

soup.find_all(lambda x:
    x.name == 'div' and
    'ad_item' in x.get('class', []) and
    not 'ad_ex_item' in x['class']
)

x.get('class', [])这个写法可以避免在div标签没有class属性时出现KeyError错误。

如果你需要排除的不止一个类，可以把最后的条件换成：

    not any(c in x['class'] for c in {'ad_ex_item', 'another_class'})

如果你想要精确排除某些类，可以使用：

   not all(c in x['class'] for c in {'ad_ex_item', 'another_class'})

回答于 2025-04-17 由 Python大师

分享举报

你可以使用像这样的严格条件：

soup.select("div[class='ad_item']")

这样可以精确地找到带有特定类的 div 元素。在这个例子中，它只会找到类名为 'ad_item' 的元素，而不会包含其他用空格分开的类名。

回答于 2025-04-17 由 Python大师

分享举报

我找到了一种解决办法，虽然和BS4没有关系，这纯粹是用Python写的代码。

for item in soup.find_all('div',class_="ad_item"):
     if len(item["class"]) != 1:
         continue;

这个方法基本上是跳过那些有多个CSS类的项目。

回答于 2025-04-17 由 Python大师

分享举报