从维基百科艺术中获取所有标题和普通文本

from bs4 import BeautifulSoup # ---- Definitions ----# #Amount of documents amount_of_documents = 1 #Directory of raw HTML documents directory_of_raw_documents = "raw_documents/" #Directory of parsed documents directory_of_parsed_documents = "parsed_documents/" # ---- Code ----# def open_document(): for i in range (1, 1+1): with open(directory_of_raw_documents + str(i), "r") as document: html = document.read() soup = BeautifulSoup(html, "html.parser") body = soup.find('div', id='bodyContent') for elements in body.find_all('p'): print(elements.text) open_document()

1条回答

网友

1楼 · 发布于 2024-05-28 19:57:58

您可能会对使用专门的wikipedia页面解析器感兴趣，比如^{} package。这样您就可以很容易地获得内容：

In [1]: import wikipedia

In [2]: page = wikipedia.page("Amadeus (film)")

In [3]: page.summary
Out[3]: u"Amadeus is a 1984 American period drama film directed by Milo\u0161 Forman, written by Peter Shaffer, and adapted from Shaffer's stage play Amadeus (1979). The story, set in Vienna, Austria, during the latter half of the 18th century, is a fictionalized biography of Wolfgang Amadeus Mozart. Mozart's music is heard extensively in the soundtrack of the movie. Its central thesis is that Antonio Salieri, an Italian contemporary of Mozart is so driven by jealousy of the latter and his success as a composer that he plans to kill him and to pass off a Requiem, which he secretly commissioned from Mozart as his own, to be premiered at Mozart's funeral. Historically, the Requiem which was never finished was commissioned by Count von Walsegg and Salieri, far from being jealous of Mozart, was on good terms with him and even tutored his son after Mozart's death.\nThe film was nominated for 53 awards and received 40, which included eight Academy Awards (including Best Picture), four BAFTA Awards, four Golden Globes, and a Directors Guild of America (DGA) award. As of 2016, it is the most recent film to have more than one nomination in the Academy Award for Best Actor category. In 1998, the American Film Institute ranked Amadeus 53rd on its 100 Years... 100 Movies list."

In [4]: page.content
Out[4]: u'Amadeus is a 1984 American period drama film directed by Milo\u0161 Forman, written by Peter Shaffer, and adapted from Shaffer\'s s
...
Amadeus Filming locations at Movieloci.com'

关于获取标题，下面是通过BeautifulSoup获取它们的示例代码：

^{pr2}$

h2 .mw-headline是一个CSS selector，它将与h2父元素下的mw-headline类相匹配。在

相关问题更多 >

编程相关推荐

热门问题

热门文章