擅长:python、mysql、java
<p>替代解决方案</p>
<pre><code>soup.find_all('div', class_=lambda x: x not in classToIgnore)
</code></pre>
<p>示例</p>
<pre><code>from bs4 import BeautifulSoup
html = """
<div class="c1"></div>
<div class="c1"></div>
<div class="c2"></div>
<div class="c3"></div>
<div class="c4"></div>
"""
soup = BeautifulSoup(html, 'html.parser')
classToIgnore = ["c1", "c2"]
print(soup.find_all('div', class_=lambda x: x not in classToIgnore))
</code></pre>
<p>输出</p>
<pre><code>[<div class="c3"></div>, <div class="c4"></div>]
</code></pre>
<p>如果您正在处理嵌套类,那么请尝试使用<em><a href="https://www.crummy.com/software/BeautifulSoup/bs4/doc/#decompose" rel="nofollow noreferrer">decompose</a></em>删除内部不需要的类,然后只使用<code>find_all('div')</code></p>
<pre><code>for div in soup.find_all('div', class_=lambda x: x in classToIgnore):
div.decompose()
print(soup.find_all('div'))
</code></pre>
<p>这可能会留下一些额外的空间,但你可以很容易地剥离后。你知道吗</p>