我首先从RSS提要中获取所有文章URL,并检查该列表中是否有重复的内容。然后,我想对照旧文章URL的csv文件检查这些独特的文章URL,以避免与csv列表重复。我只想打印出与csv中的旧URL不匹配的新URL
我在后面的部分遇到了麻烦,非常感谢您的帮助
import requests
from bs4 import BeautifulSoup
import csv
feed_urls = ["https://www.example.com/rss"]
with open("Old_Articles.csv", "r", encoding="utf-8") as r:
old_articles = csv.reader(r, delimiter=",")
for url in feed_urls:
response = requests.get(url)
html_source = response.text
soup = BeautifulSoup(html_source, "xml")
new_articles = set()
for link in soup.findAll("atom:link"):
new_articles.add(link.get("href"))
for link in new_articles:
if link not in old_articles:
print("Not Matched")
else:
print("Matched")
目前没有回答
相关问题 更多 >
编程相关推荐