试图在多个<td></td>之间收集数据,但由于没有类引用

2024-06-16 09:57:49 发布

您现在位置:Python中文网/ 问答频道 /正文

HTML比我知道的要复杂得多。下面是提取我需要收集的数据的代码

title = soup.find_all('tr',attrs={'class':'cM'})
first = title[0]
first

enter image description here

我可以通过以下代码获得电影的标题:

#movie title
first.find(attrs={'class':'cI'}).text

但是,下面的数据(年份,评级,利率,脚趾)是我有困难收集,我不知道什么样的类或参考,我需要打电话得到它

<td>2017</td><td>13+</td><td>7.9</td><td>92%</td>

以下是HTML:

<tr class="cM c2" itemprop="itemListElement" itemscope="" itemtype="//schema.org/ListItem"><td class="cH"><a href="/movie/thor-ragnarok-2017"><div class="d9 cN"><picture class="eT"><source srcset="https://img.reelgood.com/content/movie/19dcfe68-dc06-43ea-9c44-42255e780898/poster-92.webp 92w,https://img.reelgood.com/content/movie/19dcfe68-dc06-43ea-9c44-42255e780898/poster-154.webp 154w,https://img.reelgood.com/content/movie/19dcfe68-dc06-43ea-9c44-42255e780898/poster-185.webp 185w,https://img.reelgood.com/content/movie/19dcfe68-dc06-43ea-9c44-42255e780898/poster-342.webp 342w" type="image/webp"/><source srcset="https://img.reelgood.com/content/movie/19dcfe68-dc06-43ea-9c44-42255e780898/poster-92.jpg 92w,https://img.reelgood.com/content/movie/19dcfe68-dc06-43ea-9c44-42255e780898/poster-154.jpg 154w,https://img.reelgood.com/content/movie/19dcfe68-dc06-43ea-9c44-42255e780898/poster-185.jpg 185w,https://img.reelgood.com/content/movie/19dcfe68-dc06-43ea-9c44-42255e780898/poster-342.jpg 342w" type="image/jpeg"/><img alt="Watch Thor: Ragnarok" class="eU" data-async-image="true" decoding="async" src="https://img.reelgood.com/content/movie/19dcfe68-dc06-43ea-9c44-42255e780898/poster-342.jpg"/></picture></div></a></td><td class="cI"><a href="/movie/thor-ragnarok-2017">Thor: Ragnarok</a><meta content="https://reelgood.com/movie/thor-ragnarok-2017" itemprop="url"><meta content="1" itemprop="position"/></meta></td><td class="cJ"></td><td>2017</td><td>13+</td><td>7.9</td><td>92%</td><td class="cT"><span class="cU"><div class="hp cV"><img alt="netflix" src="https://img.reelgood.com/source-logos/netflix.svg"/></div></span><span class="cX">+ <!-- -->Rent or Buy</span><span><span class="cW"></span></span></td><td class="c0"></td><td class="cO"><div class="cP"><div><span>Want To See</span><img alt="Want To See Icon" src="/assets/f4b0d8c.svg" title="Add movie to watchlist"/></div><div class="cR"><span>Seen</span><img alt="Check Mark Icon" src="/assets/963fd9c.svg" title="Mark movie as seen"/></div></div></td></tr>

Tags: httpsdivcomimgtitlecontentmovieclass
1条回答
网友
1楼 · 发布于 2024-06-16 09:57:49

您可以使用findNext("td")

例如:

print( first.find(attrs={'class':'cI'}).text )
print( first.find(attrs={'class':'cJ'}).findNext("td").text )
print( first.find(attrs={'class':'cJ'}).findNext("td").findNext("td").text )
print( first.find(attrs={'class':'cJ'}).findNext("td").findNext("td").findNext("td").text )
print( first.find(attrs={'class':'cJ'}).findNext("td").findNext("td").findNext("td").findNext("td").text )

如果您不想重复代码

print( first.find(attrs={'class':'cI'}).text )
obj = first.find(attrs={'class':'cJ'})
if obj:
    for i in range(4):
        obj = obj.findNext("td")
        if obj:
            print( obj.text )

输出:

Thor: Ragnarok
2017
13+
7.9
92%

相关问题 更多 >