擅长:python、mysql、java
<p>以下代码将以HTML格式保存下载的文章。在文件夹中,你会发现。<code>tagesschau_paper0.html, tagesschau_paper1.html, tagesschau_paper2.html, .....</code></p>
<pre><code>import newspaper
from newspaper import news_pool
tagesschau_paper = newspaper.build('http://tagesschau.de')
cnn_paper = newspaper.build('http://cnn.com')
papers = [tagesschau_paper, cnn_paper]
news_pool.set(papers, threads_per_source=2)
news_pool.join()
for i in range (tagesschau_paper.size()):
with open("tagesschau_paper{}.html".format(i), "w") as file:
file.write(tagesschau_paper.articles[i].html)
</code></pre>
<p>注意:<code>news_pool</code>没有从CNN得到任何东西,所以我跳过了为它编写代码。如果选中<code>cnn_paper.size()</code>,结果是<code>0</code>。您必须导入并使用<a href="https://newspaper.readthedocs.io/en/latest/user_guide/advanced.html?highlight=threads_per_source#explicitly-building-a-news-source" rel="nofollow noreferrer">Source</a>。在</p>
<p>你也可以按照上面的格式发表文章,例如,以其他格式发表文章。在</p>