我如何使用BeautifulSoup在IMDB网站上“描述”电影？

2条回答

网友

1楼 · 编辑于 2024-05-19 22:46:37

你可以用这个，：），如果你有帮助，请帮我解决。。thks

from bs4 import BeautifulSoup
from requests_html import HTMLSession

URL = 'https://www.imdb.com/chart/moviemeter/?ref_=nv_mv_mpm' #url of Most Popular Movies in IMDB

PAGE = HTMLSession().get(URL)
PAGE_BS4 = BeautifulSoup(PAGE.html.html,'html.parser')

MoviesObj = PAGE_BS4.find_all("tbody","lister-list") #get table body of Most Popular Movies
for index in range(len(MoviesObj[0].find_all("td","titleColumn"))):
    a = list(MoviesObj[0].find_all("td","titleColumn")[index])[1]
    href = 'https://www.imdb.com'+a.get('href') #get each link for movie page
    moviepage = HTMLSession().get(href) #request each page of movie
    moviepage = BeautifulSoup(moviepage.html.html,'html.parser')
    title = list(moviepage.find_all('h1')[0].stripped_strings)[0] #parse title
    year = list(moviepage.find_all('h1')[0].stripped_strings)[2] #parse year
    try:
        score = list(moviepage.find_all('div','ratingValue')[0].stripped_strings)[0] #parse score if is available
    except IndexError:
        score = '-' #if score is not available '-' is filled
    description = list(moviepage.find_all('div','summary_text')[0].stripped_strings)[0] #parse description
    print(f'TITLE: {title}      YEAR: {year}       SCORE: {score}\nDESCRIPTION:{description}\n')

萨尔达尼亚青年酒店 @乌姆萨尔达尼亚

网友

2楼 · 编辑于 2024-05-19 22:46:37

一般来说，lxml比beautifulsoup好得多

import requests 
from lxml 
import html

url = "xxxx"

r = requests.get(url)

tree = html.fromstring(r.text)

rows = tree.xpath('//div[@class="lister-item mode-detail"]')

for row in rows:
    description = row.xpath('.//div[@class="ratings-bar"]/following-sibling::p[@class="text-muted"]/text()')[0].strip()

相关问题更多 >

编程相关推荐

热门问题

热门文章

我如何使用BeautifulSoup在IMDB网站上“描述”电影？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >