如何在Python/MySQL中比较并合并两个列表?
我想合并一些数据。以下是我的MySQL表格。我想用Python来遍历两个列表,一个列表中有重复项标记为'x',另一个列表中没有重复项。
这是一些示例数据。实际数据量非常大。
比如说:
a b c d e f key dupe
--------------------
1 d c f k l 1 x
2 g h j 1
3 i h u u 2
4 u r t 2 x
从上面的示例表格中,我想要的输出结果是:
a b c d e f key dupe
--------------------
2 g c h k j 1
3 i r h u u 2
我目前的进展是:
import string, os, sys
import MySQLdb
from EncryptedFile import EncryptedFile
enc = EncryptedFile( os.getenv("HOME") + '/.py-encrypted-file')
user = enc.getValue("user")
pw = enc.getValue("pw")
db = MySQLdb.connect(host="127.0.0.1", user=user, passwd=pw,db=user)
cursor = db.cursor()
cursor2 = db.cursor()
cursor.execute("select * from delThisTable where dupe is null")
cursor2.execute("select * from delThisTable where dupe is not null")
result = cursor.fetchall()
result2 = cursor2.fetchall()
for each record
for each field
perform the comparison and perform the necessary updates
### How do I compare the record with same key value and update the original row null field value with the non-null value from the duplicate? Please fill this void...
cursor.close()
cursor2.close()
db.close()
谢谢大家!
1 个回答
2
好的,我们来点乐子吧...
mysql> create table so (a int, b char, c char, d char, e char, f char, `key` int, dupe char);
Query OK, 0 rows affected (0.05 sec)
mysql> insert into so values (1, 'd', 'c', 'f', 'k', 'l', 1, 'x'), (2, 'g', null, 'h', null, 'j', 1, null), (3, 'i', null, 'h', 'u', 'u', 2, null), (4, 'u', 'r', null, null, 't', 2, 'x');
Query OK, 4 rows affected (0.00 sec)
Records: 4 Duplicates: 0 Warnings: 0
mysql> select * from so order by a;
+------+------+------+------+------+------+------+------+
| a | b | c | d | e | f | key | dupe |
+------+------+------+------+------+------+------+------+
| 1 | d | c | f | k | l | 1 | x |
| 2 | g | NULL | h | NULL | j | 1 | NULL |
| 3 | i | NULL | h | u | u | 2 | NULL |
| 4 | u | r | NULL | NULL | t | 2 | x |
+------+------+------+------+------+------+------+------+
4 rows in set (0.00 sec)
Python 2.6.5 (r265:79063, Mar 26 2010, 22:43:05)
[GCC 4.2.1 (Apple Inc. build 5646) (dot 1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import MySQLdb
>>> db = MySQLdb.connect(host="127.0.0.1", db="test")
>>> c = db.cursor()
>>> c.execute("SELECT a, b, c, d, e, f, `key`, dupe FROM so")
4L
>>> rows = c.fetchall()
>>> rows
((1L, 'd', 'c', 'f', 'k', 'l', 1L, 'x'), (4L, 'u', 'r', None, None, 't', 2L, 'x'), (2L, 'g', None, 'h', None, 'j', 1L, None), (3L, 'i', None, 'h', 'u', 'u', 2L, None))
>>> data = dict()
>>> for row in rows:
... key, isDupe = row[-2], row[-1]
... if key not in data:
... data[key] = list(row[:-1])
... else:
... for i in range(len(row)-1):
... if data[key][i] is None or (not isDupe and row[i] is not None):
... data[key][i] = row[i]
...
>>> data
{1L: [2L, 'g', 'c', 'h', 'k', 'j', 1L], 2L: [3L, 'i', 'r', 'h', 'u', 'u', 2L]}