Python 列表问题
我遇到了一些问题,希望能得到帮助。我有一个Python列表,内容如下:
fail = [
['da39a3ee5e6b4b0d3255bfef95601890afd80709', 'ron\\b\\include', 'Test.java']
['b5cc17d3a35877ca8b76f0b2e07497039c250696', 'ron\\c', 'apa1.txt']
['95d1543adea47e88923c3d4ad56e9f65c2b40c76', 'ron\\c', 'knark.txt']
['da39a3ee5e6b4b0d3255bfef95601890afd80709', 'ron\\d', 'Sourcecheck.py']
['da39a3ee5e6b4b0d3255bfef95601890afd80709', 'ron\\a\\include', 'svin.txt']
['b5cc17d3a35877ca8b76f0b2e07497039c250696', 'ron\\a', 'apa2.txt']
['95d1543adea47e88923c3d4ad56e9f65c2b40c76', 'ron\\c', 'apa.txt']
sha1 value, directory, filename
我想根据sha1值和目录把这些内容分成两个不同的列表。举个例子。
['95d1543adea47e88923c3d4ad56e9f65c2b40c76', 'ron\\c', 'apa.txt']
['95d1543adea47e88923c3d4ad56e9f65c2b40c76', 'ron\\c', 'knark.txt']
我想把那些在同一个目录下且sha1值相同的内容添加到列表duplicate = []
中(并且只限于那个目录)。其他的条目我想放到另一个列表,比如diff = []
,因为它们的sha1值相同,但目录不同。
我对这个逻辑有点迷糊,所以任何帮助我都非常感激!
编辑:修正了一个拼写错误,最后一个值(文件名)在某些情况下是一个只有一个元素的列表,这完全不正确,感谢SilentGhost让我意识到这个问题。
5 个回答
1
在下面的代码示例中,我使用一个基于SHA1和目录名称的键来检测唯一和重复的条目,并为管理工作留出字典空间。
# Test dataset
fail = [
['da39a3ee5e6b4b0d3255bfef95601890afd80709', 'ron\\b\\include', 'Test.java'],
['b5cc17d3a35877ca8b76f0b2e07497039c250696', 'ron\\c', 'apa1.txt'],
['95d1543adea47e88923c3d4ad56e9f65c2b40c76', 'ron\\c', 'knark.txt'],
['da39a3ee5e6b4b0d3255bfef95601890afd80709', 'ron\\d', 'Sourcecheck.py'],
['da39a3ee5e6b4b0d3255bfef95601890afd80709', 'ron\\a\\include', ['svin.txt']],
['b5cc17d3a35877ca8b76f0b2e07497039c250696', 'ron\\a', ['apa2.txt']],
['95d1543adea47e88923c3d4ad56e9f65c2b40c76', 'ron\\c', ['apa.txt']],
]
def sort_duplicates(filelist):
"""Returns a tuplie whose first element is a list of unique files,
and second element is a list of duplicate files.
"""
diff = []
diff_d = {}
duplicate = []
duplicate_d = {}
for entry in filelist:
# Make an immutable key based on the SHA-1 and directory strings
key = (entry[0], entry[1])
# If this entry is a known duplicate, add it to the duplicate list
if key in duplicate_d:
duplicate.append(entry)
# If this entry is a new duplicate, add it to the duplicate list
elif key in diff_d:
duplicate.append(entry)
duplicate_d[key] = entry
# And relocate the matching entry to the duplicate list
matching_entry = diff_d[key]
duplicate.append(matching_entry)
duplicate_d[key] = matching_entry
del diff_d[key]
diff.remove(matching_entry)
# Otherwise add this entry to the different list
else:
diff.append(entry)
diff_d[key] = entry
return (diff, duplicate)
def test():
global fail
diff, dups = sort_duplicates(fail)
print "Diff:", diff
print "Dups:", dups
test()
1
你可以简单地遍历所有的值,然后用一个内部循环来比较目录。如果目录相同,再比较值,最后把它们放到列表里。这样做的话,你就能得到一个比较简单的n^2算法来整理这些数据。
可能像这样的一段未经测试的代码:
>>>for i in range(len(fail)-1):
... dir = fail[i][1]
... sha1 = fail[i][0]
... for j in range(i+1,len(fail)):
... if dir == fail[j][1]: #is this how you compare strings?
... if sha1 == fail[j][0]:
... #remove from fail and add to duplicate and add other to diff
再说一次,这段代码是未经测试的。
3
duplicate = []
# Sort the list so we can compare adjacent values
fail.sort()
#if you didn't want to modify the list in place you can use:
#sortedFail = sorted(fail)
# and then use sortedFail in the rest of the code instead of fail
for i, x in enumerate(fail):
if i+1 == len(fail):
#end of the list
break
if x[:2] == fail[i+1][:2]:
if x not in duplicate:
duplicate.add(x)
if fail[i+1] not in duplicate:
duplicate.add(fail[i+1])
# diff is just anything not in duplicate as far as I can tell from the explanation
diff = [d for d in fail if d not in duplicate]
根据你给的例子输入
duplicate: [
['95d1543adea47e88923c3d4ad56e9f65c2b40c76', 'ron\\c', ['apa.txt']],
['95d1543adea47e88923c3d4ad56e9f65c2b40c76', 'ron\\c', 'knark.txt']
]
diff: [
['b5cc17d3a35877ca8b76f0b2e07497039c250696', 'ron\\a', ['apa2.txt']],
['b5cc17d3a35877ca8b76f0b2e07497039c250696', 'ron\\c', 'apa1.txt'],
['da39a3ee5e6b4b0d3255bfef95601890afd80709', 'ron\\a\\include', ['svin.txt']],
['da39a3ee5e6b4b0d3255bfef95601890afd80709', 'ron\\b\\include', 'Test.java'],
['da39a3ee5e6b4b0d3255bfef95601890afd80709', 'ron\\d', 'Sourcecheck.py']
]
我可能有些理解错了,但我觉得这就是你想问的内容。