清理python中的刮取文本

#Store url url = 'https://www.gutenberg.org/files/514/514-h/514-h.htm' html = r.text print(html) #Create a BeautifulSoup object from the HTML soup = BeautifulSoup(html, "html5lib") type(soup) #Scrape entire text using 'get' and print it text = soup.get_text() print(text) #translate text using google API translator init the Google API translator translator = Translator() translation = translator.translate(text,dest="ar") print(translation)

1条回答

网友

1楼 · 发布于 2024-04-25 17:33:43

当您想刮取文本数据时，您可以从元素中找到，文本是用bs4模块中的find_all方法写入p标记的，因此您可以从中获取文本数据

from bs4 import BeautifulSoup
import requests
url = 'https://www.gutenberg.org/files/514/514-h/514-h.htm'
response=requests.get(url)
html = response.text
# print(html)
#Create a BeautifulSoup object from the HTML
soup = BeautifulSoup(html, "html.parser")
paragraph=soup.find_all("p")
for para in paragraph:
    print(para.text)

Output:
"Christmas won't be Christmas without any presents," grumbled Jo, lying
on the rug.
...

相关问题更多 >

编程相关推荐

热门问题

热门文章