比较等长列表找出相同元素的位置信息
我想比较几个长度相同但内容不同的列表。我的脚本应该只返回那些在所有列表中都完全相同的元素的位置。
比如说:
l = [[1,2,3,4,5,6,7,8],[9,8,8,4,3,4,5,7,8],[5,6,7,4,9,9,9,8],[0,0,1,4,7,6,3,8]]
最后我得到的位置列表 p = [3,7],因为在所有列表中,位置3和位置7都有'4'和'8'这两个元素。
这些元素也可以是字符串,我只是用整数举个例子。谢谢大家的帮助!
4 个回答
0
li = [[1,2,3,4,5,6,7,8],[9,8,8,4,3,6,5,8],[5,6,7,4,9,9,9,8],[0,0,1,4,7,6,3,8]]
first = li[0]
r = range(len(first))
for current in li[1:]:
r = [ i for i in r if current[i]==first[i]]
print [first[i] for i in r]
结果
[4, 8]
.
比较执行时间:
from time import clock
li = [[1,2,3,4,5,6,7,8,9,10],
[9,8,8,4,5,6,5,8,9,13],
[5,6,7,4,9,9,9,8,9,12],
[0,0,1,4,7,6,3,8,9,5]]
n = 10000
te = clock()
for turn in xrange(n):
first = li[0]
r = range(len(first))
for current in li[1:]:
r = [ i for i in r if current[i]==first[i]]
x = [first[i] for i in r]
t1 = clock()-te
print 't1 =',t1
print x
te = clock()
for turn in xrange(n):
y = [j[0] for i, j in enumerate(zip(*li)) if all(j[0]==k for k in j[1:])]
t2 = clock()-te
print 't2 =',t2
print y
print 't2/t1 =',t2/t1
print
结果
t1 = 0.176347273187
[4, 8, 9]
t2 = 0.579408755442
[4, 8, 9]
t2/t1 = 3.28561221827
.
与
li = [[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,2,22,26,24,25],
[9,8,8,4,5,6,5,8,9,13,18,12,15,14,15,15,4,16,19,20,2,158,35,24,13],
[5,6,7,4,9,9,9,8,9,12,45,12,4,19,15,20,24,18,19,20,2,58,23,24,25],
[0,0,1,4,7,6,3,8,9,5,12,12,12,15,15,15,5,3,14,20,9,18,28,24,14]]
结果
t1 = 0.343173188632
[4, 8, 9, 12, 15, 20, 24]
t2 = 1.21259110432
[4, 8, 9, 12, 15, 20, 24]
t2/t1 = 3.53346690385
2
我喜欢eumiro的解决方案,不过我用的是一个集合(set)。
p = [i for i, j in enumerate(zip(*l)) if len(set(j)) == 1]
4
l = [[1,2,3,4,5,6,7,8],[9,8,8,4,3,4,5,7,8],[5,6,7,4,9,9,9,8],[0,0,1,4,7,6,3,8]]
p = [i for i, j in enumerate(zip(*l)) if all(j[0]==k for k in j[1:])]
# p == [3] - because of some typo in your original list, probably too many elements in the second list.
这其实是一个简洁的写法(列表推导式),比起更详细的写法要简单很多:
p = []
for i, j in enumerate(zip(*l)):
if all(j[0]==k for k in j[1:]):
p.append(i)
zip(*l)
这个操作会给你:
[(1, 9, 5, 0),
(2, 8, 6, 0),
(3, 8, 7, 1),
(4, 4, 4, 4),
(5, 3, 9, 7),
(6, 4, 9, 6),
(7, 5, 9, 3),
(8, 7, 8, 8)]
enumerate()
会给列表中的每个元组加上编号,从0开始,依次是0, 1, 2,等等。
all(j[0]==k for k in j[1:])
这个表达式会把元组的第一个元素和后面所有的元素进行比较,如果它们都相等,就返回 True
,如果有一个不相等,就返回 False
(一旦发现不同的元素,它就会立刻返回 False
,所以这样会更快)。