如何在列表中找到数字序列的长度?(有没有比我现在的方法更快的?)
我有一个只包含1和0的列表,里面没有其他东西。我想找到1的位置,特别是想知道一串1是从哪里开始的,在哪里结束(在下面的代码中,就是那串1的“长度”……其实可以是那串1的“长度”,也可以是那串1的结束位置,因为我可以通过开始和结束的位置算出长度)。
我把这些1的连续段存储在一个哈希表里。有没有比我现在的方法更快的方式来获取我想要的结果?我还在学习Python,而我在现实生活中使用的列表要大得多,所以速度对我来说很重要。
previous = 0
cnt = 0
startLength = {}
for r in listy:
if previous == 0 and r == 1:
start = cnt
startLength[start] = 1
if previous == 1 and r == 1:
startLength[start] = 1 + cnt - start
previous = r
cnt += 1
for s,l in startLength.iteritems():
print "A run of 1's starts at position %s and lasts %s" % (s,l)
4 个回答
1
这里有一个稍微更高效的解决方案(抱歉用的是JavaScript)。关键在于只存储一次长度,在你的代码中,每次长度增加一的时候都在进行计算“startLength[start] = 1 + cnt - start”。
通过使用条件“如果 previous == 0 且 r == 1”来代替你原来的“如果 previous == 1 且 r == 1”,我减少了计算的次数,但我还需要在循环结束后加一个“如果 r == 1”的判断来处理最后的情况。
var test=[0,1,1,1,0,0,0,1,1,0,0,1,0];
function runs(arr) {
var result = {};
var start = 0;
var previous = 0;
var cnt = 0;
var r = 0;
for(; cnt<arr.length; cnt++) {
var r = arr[cnt];
if(r == 1 && previous == 0)
start = cnt;
if(r == 0 && previous == 1)
result[start] = cnt - start;
previous = r;
}
if(r == 1)
result[start] = cnt - start;
return result;
}
var result = runs(test);
for(var start in result)
console.log("start " + start + " length " + result[start]);
编辑 2 这里有一个Python的基准测试,显示使用groupby函数(目前这个问题的最佳答案)要慢得多。
from itertools import groupby
from operator import itemgetter
import random
import time
lst = [ 1,1,1,1,1,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0]
def makeList(size):
random.seed()
return [random.randint(0,1) for r in xrange(size)]
def runs1(lst, showOutput):
startLength = {}
for k,v in groupby(enumerate(lst),key=itemgetter(1)):
if k:
v = list(v)
startLength[v[0][0]] = v[-1][0] + 1 - v[0][0]
if showOutput == True:
for s,l in startLength.iteritems():
print s,l
def runs2(lst, showOutput):
previous = 0
cnt = 0
startLength = {}
for r in lst:
if previous == 0 and r == 1:
start = cnt
if previous == 1 and r == 0:
startLength[start] = cnt - start
previous = r
cnt += 1
if r == 1:
startLength[start] = cnt - start
if showOutput == True:
for s,l in startLength.iteritems():
print s,l
testList = makeList(10)
print "slow version"
runs1(testList, True)
print "fast version"
runs2(testList, True)
benchmarkList = makeList(10000)
start = time.time()
runs1(benchmarkList, False)
print "slow ", time.time() - start
start = time.time()
runs2(benchmarkList, False)
print "fast ", time.time() - start
start = time.time()
runs1(benchmarkList, False)
print "slow ", time.time() - start
start = time.time()
runs2(benchmarkList, False)
print "fast ", time.time() - start
start = time.time()
runs1(benchmarkList, False)
print "slow ", time.time() - start
start = time.time()
runs2(benchmarkList, False)
print "fast ", time.time() - start
2
除了@mgilson给出的很有Python风格的回答,你还可以试试下面这样的写法:
lst = [1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1]
start, end = False, False
for i, x in enumerate(lst):
if x == 1 and start is False:
start = i
if x == 0 and start is not False and end is False:
end = i-1
if start is not False and end is not False:
print start, end # and len is (end-start+1)
start, end = False, False
if start is not False:
print start, i
输出结果:
0 4
12 15
22 23
8
我可能会用 itertools.groupby
来解决这个问题。
lst = [ 1,1,1,1,1,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0]
from itertools import groupby
from operator import itemgetter
for k,v in groupby(enumerate(lst),key=itemgetter(1)):
if k:
v = list(v)
print v[0][0],v[-1][0]
这段代码会打印出连续的1的开始和结束位置。