不能用BeautifulSoup计数空标记?

2024-04-19 01:16:56 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个look的其他问题,但我找不到任何东西。你知道吗

我的HTML看起来像

<div class="rating-input">
     <i data-value="1" class="rating-active-star"></i>
     <i data-value="2" class="rating-active-star"></i>
     <i data-value="3" class="rating-active-star"></i>
     <i data-value="4" class="rating-active-star"></i>
     <i data-value="5" class="rating-inactive-star"></i>
</div>

我那句失败的台词是:

details = [{"name": film.select('h2')[0].text.split('\n')[0],
            "rating":len(film.select('div i.rating-inactive-star'))} 
          for film in detail_row]

因为它带来了:

[{'name': 'The LEGO Batman Movie', 'rating': 0}, 
 {'name': 'Sing', 'rating': 0}, 
 {'name': 'John Wick: Chapter 2', 'rating': 0}, 
 {'name': 'Fifty Shades Darker', 'rating': 0}, 
 {'name': 'The Great Wall', 'rating': 0}, 
 {'name': 'Hidden Figures', 'rating': 0}, 
 {'name': 'La La Land', 'rating': 0}, 
 {'name': 'The Founder', 'rating': 0}, 
 {'name': 'Hacksaw Ridge', 'rating': 0}, 
 {'name': 'T2 Trainspotting', 'rating': 0}, 
 {'name': 'Split', 'rating': 0}, 
 {'name': 'Patriots Day', 'rating': 0}
]

所有评分都为零。我期望的是类为rating-active-stari元素的数量(例如,对于上面的html为4)。你知道吗

其中,将我的分级选择器从'div i.rating-active-star'更改为'div i',所有'rating': 0都变成'rating': 5

以下是我的整个脚本(或多或少是MVP):

import requests
import bs4
data = "si=1010841&sort=cin&max=0&bd=2017-02-23&css=cat-&mod=cinemapage_movie_list&attrs=2D%2C3D%2CIMAX%2CViP%2CVIP%2CDBOX%2C4DX%2CM4J%2CSS"
data_list = data.split('&')
info = {item[0]:item[1] for item in [elem.split('=') for elem in data_list]}
response = requests.post('https://www.cineworld.co.uk/pgm-list-byfeat',info)
soup = bs4.BeautifulSoup(response.text, "html.parser")
detail_row = soup.select('div[id^=film_] div.row div.col-sm-10')
details = [{"name": film.select('h2')[0].text.split('\n')[0],
            "rating":len(film.select('div i.rating-active-star'))}
          for film in detail_row]

为什么空标签列表的长度与非空标签的长度不同?我该如何解决这个问题?你知道吗


Tags: nameindivfordatavalueselectlist
1条回答
网友
1楼 · 发布于 2024-04-19 01:16:56

问题可能在别处。此代码段似乎按预期工作:

from bs4 import BeautifulSoup

html = '''
<div class="rating-input">
 <i data-value="1" class="rating-active-star"></i>
 <i data-value="2" class="rating-active-star"></i>
 <i data-value="3" class="rating-active-star"></i>
 <i data-value="4" class="rating-active-star"></i>
 <i data-value="5" class="rating-inactive-star"></i>
</div>
'''

soup = BeautifulSoup(html, 'html.parser')
print (len(soup.select('div i.rating-inactive-star')),
        len(soup.select('div i.rating-active-star')))

相关问题 更多 >