如何在python中对照旧URL的csv检查新URL以防止重复？

2024-04-26 23:58:13 发布

您现在位置：Python中文网/ 问答频道 /正文

7491

网友

男 | 程序猿一只，喜欢编程写python代码。

我首先从RSS提要中获取所有文章URL，并检查该列表中是否有重复的内容。然后，我想对照旧文章URL的csv文件检查这些独特的文章URL，以避免与csv列表重复。我只想打印出与csv中的旧URL不匹配的新URL

我在后面的部分遇到了麻烦，非常感谢您的帮助

import requests
from bs4 import BeautifulSoup
import csv


feed_urls = ["https://www.example.com/rss"]

with open("Old_Articles.csv", "r", encoding="utf-8") as r:
    old_articles = csv.reader(r, delimiter=",")

    for url in feed_urls:
        response = requests.get(url)
        html_source = response.text
        soup = BeautifulSoup(html_source, "xml")
        new_articles = set()

        for link in soup.findAll("atom:link"):
            new_articles.add(link.get("href"))

        for link in new_articles:
            if link not in old_articles:
                print("Not Matched")
            else:
                print("Matched")

Tags： csv in import url 列表 new for feed

0条回答

目前没有回答

如何在python中对照旧URL的csv检查新URL以防止重复？

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何在python中对照旧URL的csv检查新URL以防止重复？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >