如何使用BeautifulGroup获取所有IMDB用户对movi的评论

from bs4 import BeautifulSoup testurl = "https://www.imdb.com/title/tt0357277/reviews?ref_=tt_urv" patience_time1 = 60 XPATH_loadmore = "//*[@id='load-more-trigger']" XPATH_grade = "//*[@class='review-container']/div[1]" list_grades = [] driver = webdriver.Firefox() driver.get(testurl) # This is the part in which I open all 'load more' buttons. while True: try: loadmore = driver.find_element_by_id("load-more-trigger") time.sleep(2) loadmore.click() time.sleep(5) except Exception as e: print(e) break print("Complete") time.sleep(10) # When the whole page is loaded, I want to get all 'content' parts. soup = BeautifulSoup(driver.page_source) content = soup.findAll("content") list_content = [c.text_content() for c in content] driver.quit()

1条回答

网友

1楼 · 发布于 2024-04-23 15:09:28

你用的是beauthulsoup4，对吗？在

方法名从3改为4。（document）

另外，find_all接受标记名，以及css类的可选class_参数（请参见SO answer）

所以您的代码应该使用新名称：

    # content = soup.findAll("content")
    content = soup.find_all('div', class_=['text','show-more__control'])

在你的列表理解中也使用get_text()：

^{pr2}$

最后，在获取soup时提供一个解析器：（document）

^{3}$

否则，您将遇到以下用户警告：

SO56261323.py:36: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("html.parser"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

相关问题更多 >

编程相关推荐

热门问题

热门文章