Feedparser:如果不存在,则插入pg

2024-04-20 08:52:16 发布

您现在位置:Python中文网/ 问答频道 /正文

这件事我有几个问题。 所以我尝试使用feedparser和psycopg。问题是,我不想有重复的数据。你知道吗

    def dbFeed():
conn_string ="host='localhost' dbname='rss_feed' user='postgres' password='somepassword'"
print ("Connecting to dababase\n ->%s" %(conn_string))

try:
    conn = psycopg2.connect(conn_string)
    cursor = conn.cursor()
    print ("Connected!\n")
except:
    print ('Unable to connect to the database')


 feeds_to_parse=open("C:\\Users\\Work\\Desktop\\feedparser_entry_tests_world.txt","r")


for line in feeds_to_parse:
    parser = fp.parse(str(line))
    x = len(parser['entries'])
    count = 0   
    while count < x:

现在我有几种解决办法。 一开始,我试着这样做:

cursor.execute("INSERT INTO feed (link, title, publication_date, newspaper) VALUES (%s, %s, %s, %s)",
        (parser['entries'][count]['link'], parser['entries'][count]['title'],
        parser['entries'][count]['published'],parser['feed']['title']))

但我当然有重复的数据。所以我在这里看到这个帖子: Avoiding duplicated data in PostgreSQL database in Python

我试过了,但有个元组索引超出范围的错误

cursor.execute("""INSERT INTO feed (link, title, publication_date, newspaper) SELECT %s, %s, %s, %s WHERE NOT EXISTS
              (feed.title FROM feed WHERE feed.title=%s);""",
            (parser['entries'][count]['link'], parser['entries'][count]['title'],
            parser['entries'][count]['published'],parser['feed']['title']))

但不管怎样,我不想这样做。我想在while循环中添加一个条件,在插入之前测试数据的存在性,因为我不想测试整个数据库,我只想测试最后的条目。再一次,当然它不工作,因为我猜解析器['entries'][count]['title']不是我想的那样。。。你知道吗

while count < x:
    if parser['entries'][count]['title'] != cursor.execute("SELECT feed.title FROM feed WHERE publication_date > current_date - 15"):

cursor.execute("INSERT INTO feed (link, title, publication_date, newspaper) VALUES (%s, %s, %s, %s)",
        (parser['entries'][count]['link'], parser['entries'][count]['title'],
        parser['entries'][count]['published'],parser['feed']['title']))




conn.commit()

cursor.close()
conn.close()

Tags: toparserexecutedatestringtitleparsefeed
1条回答
网友
1楼 · 发布于 2024-04-20 08:52:16

您必须添加where part中使用的第二个标题,也可以在其中添加额外的条件:

cursor.execute(
    "INSERT INTO feed (link, title, publication_date, newspaper) "
    "SELECT %s, %s, %s, %s WHERE NOT EXISTS (SELECT 1 FROM feed "
    "WHERE title = %s AND publication_date > current_date - 15);",
    (parser['entries'][count]['link'],
     parser['entries'][count]['title'],
     parser['entries'][count]['published'],
     parser['feed']['title'],
     parser['feed']['title']))

相关问题 更多 >