解析时，如何拆分string对象，然后拉出最后一个元素（Python）

def get_text(): #writes the html into a new text file called new_christie.txt with open('new_christie.txt','w', encoding='utf-8') as book: url = 'http://www.gutenberg.org/files/1155/1155-h/1155-h.htm' r = requests.get(url) data = r.text soup = BeautifulSoup(data, 'html.parser') str = soup.prettify() text = str.split('XXVIII. AND AFTER') #last phrase in Table of Contents text = soup.find_all('p') #finds all of the text between paragraphs content = text[-1:] for p in content: line = p.get_text() book.write(line)

1条回答

网友

1楼 · 发布于 2024-04-24 02:42:32

我提供这个解决方案，除了注意我使用lxml而不是美丽的汤，因为我知道它更好。我不记得它是不是本机安装的，但是您可以在终端中用pip install lxml安装它。你知道吗

import requests
from lxml import html

def get_text():
    with open('new_christie.txt','w') as book:
        url = 'http://www.gutenberg.org/files/1155/1155-h/1155-h.htm'
        r = requests.get(url)
        data = r.text
        soup = html.fromstring(data.encode('utf8'))
        text = ' '.join(soup.xpath('//p/text()'))
        text = text.partition('AND AFTER')[2]
        book.write(text)

相关问题更多 >

编程相关推荐

热门问题

热门文章