报馆 - 问答 - Python中文网

import newspaper from newspaper import news_pool tagesschau_paper = newspaper.build('http://tagesschau.de') cnn_paper = newspaper.build('http://cnn.com') papers = [tagesschau_paper, cnn_paper] news_pool.set(papers, threads_per_source=2) # (3*2) = 6 threads total news_pool.join()`

2条回答

网友

1楼 · 编辑于 2024-05-13 17:13:10

您可以使用pickle在python之外保存对象，并在以后重新打开它们：

file_Name = "testfile"
# open the file for writing
fileObject = open(file_Name,'wb') 

# this writes the object news_pool to the
# file named 'testfile'
pickle.dump(news_pool,fileObject)   

# here we close the fileObject
fileObject.close()
# we open the file for reading
fileObject = open(file_Name,'r')  
# load the object from the file into var news_pool_reopen
news_pool_reopen = pickle.load(fileObject)

网友

2楼 · 编辑于 2024-05-13 17:13:10

以下代码将以HTML格式保存下载的文章。在文件夹中，你会发现。tagesschau_paper0.html, tagesschau_paper1.html, tagesschau_paper2.html, .....

import newspaper
from newspaper import news_pool

tagesschau_paper = newspaper.build('http://tagesschau.de')
cnn_paper = newspaper.build('http://cnn.com')

papers = [tagesschau_paper, cnn_paper]
news_pool.set(papers, threads_per_source=2)
news_pool.join()

for i in range (tagesschau_paper.size()): 
    with open("tagesschau_paper{}.html".format(i), "w") as file:
    file.write(tagesschau_paper.articles[i].html)

注意：news_pool没有从CNN得到任何东西，所以我跳过了为它编写代码。如果选中cnn_paper.size()，结果是0。您必须导入并使用Source。在

你也可以按照上面的格式发表文章，例如，以其他格式发表文章。在

报馆

相关问题更多 >

编程相关推荐

热门问题

热门文章

报馆

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >