从多个同名span元素（IMDB）中刮取span文本

for year_url in years_url: # For every page in the interval 1-4 for page in pages: # Make a get request response = requests.get('http://www.imdb.com/search/title?release_date=' + year_url + '&sort=num_votes,desc&page=' + page) # Parse the content of the request with BeautifulSoup page_html = BeautifulSoup(response.text, 'lxml') # Select all the 50 movie containers from a single page mv_containers = page_html.find_all('div', class_ = 'lister-item mode-advanced') # For every movie of these 50 for container in mv_containers: # If the movie has a Metascore, then: if container.find('div', class_ = 'ratings-metascore') is not None: # Scrape the name name = container.h3.a.text names.append(name) # Scrape the genre genre = container.p.find('span', class_ = 'genre').text.rstrip().replace("\n","").split(",") genres.append(genre) # Scrape the runtime runtime = container.p.find('span', class_ = 'runtime').text runtimes.append(runtime) # Scrape the year year = container.h3.find('span', class_ = 'lister-item-year').text years.append(year) # Scrape the IMDB rating imdb = float(container.strong.text) imdb_ratings.append(imdb) # Scrape the Metascore m_score = container.find('span', class_ = 'metascore').text metascores.append(int(m_score)) # Scrape the number of votes vote = container.find('span', attrs = {'name':'nv'})['data-value'] votes.append(int(vote)) # Scrape the GrossMill gross = int(container.find('span', text='Gross:').find_next('span')['data-value'].replace(',', '')) print(gross) grossmill.append(gross)

1条回答

网友

1楼 · 发布于 2024-04-26 07:42:44

您的错误意味着对于给定的电影，没有使用text='Gross'找到span。正如我从你正在抓取的IMDB webpage中看到的，有些电影虽然有元分数，却没有显示它们的总分数。电影就是这样

在使用方法find_next()调用gross之前，应该首先检查它是否存在

替换：

gross = int(container.find('span', text='Gross:').find_next('span')['data-value'].replace(',', ''))

作者：

gross = container.find('span', text='Gross:')
if gross:
    gross = int(gross.find_next('span')['data-value'].replace(',', ''))

相关问题更多 >

编程相关推荐

热门问题

热门文章