访问嵌套HTML中的Beauty soup元素

2024-04-25 05:20:53 发布

您现在位置:Python中文网/ 问答频道 /正文

我想引述局长及;IMDB top 250页面的解析html输出中的actor元素。python的一行程序应该是什么样子?“文本静音文本小”出现多次,并且“查找所有”似乎不是最佳方式

<span class="ipl-rating-selector__rating-value">0</span>
</div>
<div class="ipl-rating-selector__error ipl-rating-selector__wrapper">
<span>Error: please try again.</span>
</div>
</div>
<div class="ipl-rating-interactive__loader">
<img alt="loading" src="https://m.media-amazon.com/images/G/01/IMDb/spinning-progress.gif"/>
</div>
</div>
</div>
<div class="inline-block ratings-metascore">
<span class="metascore favorable">80        </span>
        Metascore
        </div>
<p class="">
    Two imprisoned men bond over a number of years, finding solace and eventual redemption through acts of common decency.</p>
<p class="text-muted text-small">
    Director:
<a href="/name/nm0001104/">Frank Darabont</a>
<span class="ghost">|</span> 
    Stars:
<a href="/name/nm0000209/">Tim Robbins</a>, 
<a href="/name/nm0000151/">Morgan Freeman</a>, 
<a href="/name/nm0348409/">Bob Gunton</a>, 
<a href="/name/nm0006669/">William Sadler</a>
</p>
<p class="text-muted text-small">
<span class="text-muted">Votes:</span>
<span data-value="2187696" name="nv">2,187,696</span>
<span class="ghost">|</span> <span class="text-muted">Gross:</span>
<span data-value="28,341,469" name="nv">$28.34M</span>
</p>
<div class="wtw-option-standalone" data-baseref="wl_li" data-tconst="tt0111161" data-watchtype="minibar"></div>
</div>

Tags: oftextname文本divdatavalueselector
3条回答

如果您使用的是BeautifulSoup 4.7.0或更高版本,则可以使用:containsCSS选择器:

soup = BeautifulSoup(your_html)
soup.select_one('p:contains("Director:","Stars:")')

这将选择包含的p标记并遍历它的子项,分别打印出导演和演员:

director_and_stars_tag = soup.select_one('p:contains("Director:")')
directors_flag = True

for name_tag in director_and_stars_tag.findChildren():
    if directors_flag:
        # These are Director tags
        if ('span' in name_tag.name):
            directors_flag = False
        else:
            print('Director: %s' % name_tag.string)
    else:
        # These are Actor tags
        print('Actor: %s' % name_tag.string)

输出:

Director: Frank Darabont
Actor: Tim Robbins
Actor: Morgan Freeman
Actor: Bob Gunton
Actor: William Sadler

如果没有可用于标识这些特定元素的id或类, 您只需遍历您的项目并检查它们是否包含您要查找的内容。
html示例的工作示例是

details = soup.find_all("p", attrs={"class": "text-muted text-small"})
for element in details:
    if "Stars" in element.text:
        stars = element.find_all("a")
        for star in stars:
            print(star.text)

相关问题 更多 >