如何在python中对照旧URL的csv检查新URL以防止重复?

2024-04-26 23:58:13 发布

您现在位置:Python中文网/ 问答频道 /正文

我首先从RSS提要中获取所有文章URL,并检查该列表中是否有重复的内容。然后,我想对照旧文章URL的csv文件检查这些独特的文章URL,以避免与csv列表重复。我只想打印出与csv中的旧URL不匹配的新URL

我在后面的部分遇到了麻烦,非常感谢您的帮助

import requests
from bs4 import BeautifulSoup
import csv


feed_urls = ["https://www.example.com/rss"]

with open("Old_Articles.csv", "r", encoding="utf-8") as r:
    old_articles = csv.reader(r, delimiter=",")

    for url in feed_urls:
        response = requests.get(url)
        html_source = response.text
        soup = BeautifulSoup(html_source, "xml")
        new_articles = set()

        for link in soup.findAll("atom:link"):
            new_articles.add(link.get("href"))

        for link in new_articles:
            if link not in old_articles:
                print("Not Matched")
            else:
                print("Matched")

Tags: csvinimporturl列表newforfeed