如何过滤掉标签中不需要的标签

text = BeautifulSoup(requests.get('http://bodetree.com/what-is-causing-your-headaches-startup-pain-points/', timeout=7.00).text) bullets = text.find_all(lambda tag: tag.name == 'ul' and not tag.attrs)

<ul> <li>You are experiencing a decrease in sales and customers</li> <li>If your brand design does not reflect what you deliver</li> <li>If you want to attract a new target audience</li> <li>Management change</li> <li><a href="http://www.risingabovethenoise.com/how-to-rebrand-19-questions-ask-before-you-start/" onclick="__gaTracker('send', 'event', 'outbound-article', 'http://www.risingabovethenoise.com/how-to-rebrand-19-questions-ask-before-you-start/', '19 Questions to Ask Yourself Before You Start Rebranding');">19 Questions to Ask Yourself Before You Start Rebranding</a></li> </ul> <ul><li class="share-item share-fb" data-title="What is Causing your Headaches?- Startup Pain Points" data-type="facebook" data-url="http://bodetree.com/what-is-causing-your-headaches-startup-pain-points/" title="Facebook"></li><li class="share-item share-tw" data-title="What is Causing your Headaches?- Startup Pain Points" data-type="twitter" data-url="http://bodetree.com/what-is-causing-your-headaches-startup-pain-points/" title="Twitter"></li><li class="share-item share-gp" data-lang="en-US" data-title="What is Causing your Headaches?- Startup Pain Points" data-type="googlePlus" data-url="http://bodetree.com/what-is-causing-your-headaches-startup-pain-points/" title="Google+"></li><li class="share-item share-pn" data-media="http://bodetree.com/wp-content/uploads/2015/04/pain-points.png" data-title="What is Causing your Headaches?- Startup Pain Points" data-type="pinterest" data-url="http://bodetree.com/what-is-causing-your-headaches-startup-pain-points/" title="Pinterest"></li></ul>

<ul> <li>You are experiencing a decrease in sales and customers</li> <li>If your brand design does not reflect what you deliver</li> <li>If you want to attract a new target audience</li> <li>Management change</li> <li>19 Questions to Ask Yourself Before You Start Rebranding</li> </ul>

1条回答

网友

1楼 · 发布于 2024-05-14 04:26:20

在文章中搜索ul，它是一个div，带有class="entry-content"：

from bs4 import BeautifulSoup

soup = BeautifulSoup(requests.get('http://bodetree.com/what-is-causing-your-headaches-startup-pain-points/', timeout=7.00).text)

bullets = soup.select("div.entry-content ul li")
print([bullet.get_text() for bullet in bullets])

印刷品：

[
    'You are experiencing a decrease in sales and customers', 
    'If your brand design does not reflect what you deliver', 
    'If you want to attract a new target audience', 
    'Management change', 
    '19 Questions to Ask Yourself Before You Start Rebranding'
]

相关问题更多 >

编程相关推荐

热门问题

热门文章