用Python搜索和比较CSV文件中的值
我有两个csv文件,一个是主文件,另一个是更新文件。我想从更新文件中提取特定的列,并检查这些值是否在主文件中存在。
这两个文件的列是一样的,基本上看起来像这样:
Listed Company's English Name,Listed Company's Chinese Name,Stock Code,Listing Status,Director's English Name,Director's Chinese Name,Capacity,Position,Appointment Date (yyyy-mm-dd),Resignation Date (yyyy-mm-dd)
C.P. Lotus Corporation,________,00122,Current,CHEARAVANONT Dhanin,___,Executive Director,,2009-12-31,
C.P. Lotus Corporation,________,00121,Current,CHEARAVANON Narong,___,Executive Director,,2001-02-01,
C.P. Lotus Corporation,________,00121,Current,CHEARAVANONT Soopakij,___,Executive Director,CEO,2000-04-14,
简单来说,我想遍历更新文件,逐个检查更新文件中的股票代码,看看它们是否在主文件中存在。
然后,对于每个匹配的股票代码,我需要检查董事姓名的值是否有不同,并记录那些不匹配的情况。
我参考了一个例子,但似乎并没有完全满足我的需求(或者我没有完全理解它...):Python: 比较两个CSV文件并搜索相似项
f1 = file(csvHKX, 'rU')
f2 = file(csvWRHK, 'rU')
f3 = file('results.csv', 'w')
csv1 = csv.reader(f1)
csv2 = csv.reader(f2)
csv3 = csv.writer(f3)
scode = [row for row in csv2]
for hkx_row in csv1:
for wrhk_row in scode:
if hkx_row[2] != wrhk_row[2]:
print 'HKX:', hkx_row
continue
f1.close()
f2.close()
f3.close()
更新文件中包含以下股票代码:'00121' 和 '01003'(用于测试)。
看起来代码是在逐行比较列表,如果股票代码不逐行匹配,就会打印出一行。所以当第一列读取到'00121'时,它会打印出包含'01003'的行,反之亦然。
但我只关心的是当找不到hkx_row[2]在wrhk_row[2]中的任何地方时。
1 个回答
0
这样做对你有帮助吗?:
文件 master.csv
Listed Company's English Name,Listed Company's Chinese Name,Stock Code,Listing Status,Director's English Name,Director's Chinese Name,Capacity,Position,Appointment Date (yyyy-mm-dd),Resignation Date (yyyy-mm-dd)
C.P. Lotus Corporation,________,00122,Current,CHEARAVANONT Dhanin,___,Executive Director,,2009-12-31,
C.P. Lotus Corporation,________,00121,Current,CHEARAVANON Narong,___,Executive Director,,2001-02-01,
C.P. Lotus Corporation,________,00121,Current,CHEARAVANONT Soopakij,___,Executive Director,CEO,2000-04-14,
C.P. Lotus Corporation,________,00123,Current,DEANINO James,___,Pilot,,2009-06-25,
C.P. Lotus Corporation,________,00129,Current,GINGE Ivy,___,Dental Technician,,2010-07-27,
C.P. Lotus Corporation,________,00127,Current,ERATOR Jane,___,Engineer,,2005-12-04,
C.P. Lotus Corporation,________,00119,Current,FIELD Mary,___,Pastrycook,,2009-06-25,
文件 update.csv
Listed Company's English Name,Listed Company's Chinese Name,Stock Code,Listing Status,Director's English Name,Director's Chinese Name,Capacity,Position,Appointment Date (yyyy-mm-dd),Resignation Date (yyyy-mm-dd)
C.P. Lotus Corporation,________,00133,Current,THOMPSON Sarah,___,Cosmonaut,,2004-01-20,
C.P. Lotus Corporation,________,00122,Current,CHEARAVANONT Dhanin,___,Executive Director,,2009-12-31,
C.P. Lotus Corporation,________,00121,Current,CHEARAVANON Narong,___,Executive Director,,2001-02-01,
C.P. Lotus Corporation,________,00121,Current,BEARD Sophia,___,Executive Director,CEO,2010-04-26,
C.P. Lotus Corporation,________,00127,Current,ERATOR Jane,___,Engineer,,2005-12-04,
C.P. Lotus Corporation,________,00129,Current,MISTOUKI Hassan,___,Folk Singer,,2010-07-27,
代码
import csv
mas = csv.reader(open('master.csv','rb'))
upd = csv.reader(open('update.csv','rb'))
set24 = set((row[2],row[4]) for row in mas)
print set24
print
updkept = [ row for row in upd if (row[2],row[4]) not in set24]
print '\n'.join(map(str,updkept))
结果
set([('00127', 'ERATOR Jane'), ('00121', 'CHEARAVANONT Soopakij'), ('00121', 'CHEARAVANON Narong'), ('00119', 'FIELD Mary'), ('00122', 'CHEARAVANONT Dhanin'), ('Stock Code', "Director's English Name"), ('00129', 'GINGE Ivy'), ('00123', 'DEANINO James')])
['C.P. Lotus Corporation', '________', '00133', 'Current', 'THOMPSON Sarah', '___', 'Cosmonaut', '', '2004-01-20', '']
['C.P. Lotus Corporation', '________', '00121', 'Current', 'BEARD Sophia', '___', 'Executive Director', 'CEO', '2010-04-26', ' ']
['C.P. Lotus Corporation', '________', '00129', 'Current', 'MISTOUKI Hassan', '___', 'Folk Singer', '', '2010-07-27', '']