Python 读取 CSV 并将值放入 MySQL 数据库

0 投票

3 回答

7343 浏览

数据工程师

提问于 2025-04-17 17:01

我正在尝试从一个csv文件中获取数据，并把这些数据放入数据库里，目前这个过程还算顺利。

但是我现在需要把数据写回到csv文件中，这样下次运行脚本时，只会把csv文件中标记以下的值放入数据库。

需要注意的是，系统中的csv文件每24小时会自动清空一次，所以可能不会有标记。因此，如果找不到标记，就把所有值都放入数据库。

我计划每30分钟运行一次这个脚本，所以csv文件里可能会有48个标记，或者每次运行时把标记移到文件的下方？

我之前是删除文件，然后在脚本中重新创建一个新文件，这样每次运行脚本时文件都是新的，但这样似乎会导致系统出现问题，所以这个方法不是很好。

希望大家能帮帮我。

谢谢！

Python代码：

import csv
import MySQLdb

mydb = MySQLdb.connect(host='localhost',
user='root',
passwd='******',
db='kestrel_keep')

cursor = mydb.cursor()

csv_data = csv.reader(file('data_csv.log'))

for row in csv_data:

    cursor.execute('INSERT INTO `heating` VALUES ( %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s,)',
   row)
#close the connection to the database.
mydb.commit()
cursor.close()
import os


print "Done"

我的CSV文件格式：

2013-02-21,21:42:00,-1.0,45.8,27.6,17.3,14.1,22.3,21.1,1,1,2,2
2013-02-21,21:48:00,-1.0,45.8,27.5,17.3,13.9,22.3,20.9,1,1,2,2

数据库 mysql 文件操作数据处理自动化脚本 csv 定时任务数据标记

3 个回答

每一行csv文件似乎都包含一个时间戳。如果这些时间戳总是递增的，你可以先查询数据库中已经记录的最大时间戳，然后在读取csv文件时跳过所有在这个时间戳之前的行。

回答于 2025-04-17 由 Python大师

分享举报

看起来你在MySQL表中的第一个字段是一个唯一的时间戳。你可以设置MySQL表，让这个字段必须是唯一的，并且在插入数据时，如果会违反这个唯一性，就会被忽略。在mysql>提示符下输入以下命令：

ALTER IGNORE TABLE heating ADD UNIQUE heatingidx (thedate, thetime)

（把thedate和thetime替换成你存放日期和时间的列名。）

一旦你对数据库做了这个修改，你只需要在你的程序中改一行代码，就能让MySQL忽略重复的插入：

cursor.execute('INSERT IGNORE INTO `heating` VALUES ( %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s,)', row)

是的，使用INSERT IGNORE ...在已经处理过的行上确实有点浪费，但考虑到你的数据频率（每6分钟一次？），这对性能影响不大。

这样做的好处是，你再也不可能意外地插入重复的数据到你的表里了。这也让你的程序逻辑保持简单，容易理解。

此外，这样可以避免两个程序同时写入同一个CSV文件。即使你的程序通常没有错误，但偶尔——也许是千载难逢——你的程序和另一个程序可能会同时尝试写入文件，这可能会导致错误或数据混乱。

你还可以通过使用cursor.executemany来让你的程序运行得更快，而不是使用cursor.execute：

rows = list(csv_data)
cursor.executemany('''INSERT IGNORE INTO `heating` VALUES
    ( %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s,)''', rows)

这相当于

for row in csv_data:    
    cursor.execute('INSERT INTO `heating` VALUES ( %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s,)',
   row)

只是它把所有数据打包成一个命令。

回答于 2025-04-17 由 Python大师

分享举报

我觉得比“标记”CSV文件更好的办法是，保持一个文件来存储你处理的最后一行的编号。

所以，如果这个文件不存在（就是存储最后处理行编号的那个文件），你就处理整个CSV文件。如果这个文件存在，你只处理在这行之后的记录。

最终代码在工作系统上：

#!/usr/bin/python
import csv
import MySQLdb
import os

mydb = MySQLdb.connect(host='localhost',
user='root',
passwd='*******',
db='kestrel_keep')

cursor = mydb.cursor()

csv_data = csv.reader(file('data_csv.log'))

start_row = 0

def getSize(fileobject):
fileobject.seek(0,2) # move the cursor to the end of the file
size = fileobject.tell()
return size

file = open('data_csv.log', 'rb')
curr_file_size = getSize(file)

# Get the last file Size
if os.path.exists("file_size"):
with open("file_size") as f:
    saved_file_size = int(f.read())


# Get the last processed line
if os.path.exists("lastline"):
with open("lastline") as f:
    start_row = int(f.read())


if curr_file_size < saved_file_size: start_row = 0

cur_row = 0
for row in csv_data:
 if cur_row >= start_row:
     cursor.execute('INSERT INTO `heating` VALUES ( %s, %s, %s, %s, %s, %s, %s, %s, %s,    %s, %s, %s, %s, %s, %s, %s ,%s)', row)

     # Other processing if necessary

 cur_row += 1

 mydb.commit()
 cursor.close()


# Store the last processed line
with open("lastline", 'w') as f:
start_line = f.write(str(cur_row + 1)) # you want to start at the **next** line
                                      # next time
# Store Current  File Size To Find File Flush    
with open("file_size", 'w') as f:
start_line = f.write(str(curr_file_size))

# not necessary but good for debug
print (str(cur_row))



 print "Done"

补充：最终代码由ZeroG提交，现在在系统上运行正常！！也感谢Xion345的帮助。

回答于 2025-04-17 由 Python大师

分享举报

Python 读取 CSV 并将值放入 MySQL 数据库

3 个回答

撰写回答