<pre><code>results = {}
range = 1825
for name, value in overDict.items():
sliding_windows = []
good = True
for tuple in value:
# Add this take information to any windows it falls into
for window in sliding_windows:
if window[0] > int(tuple[0]) - range:
window[1] += tuple[1]
window[2] += 1
# start a new window with this date
sliding_windows.append([int(tuple[0]), tuple[1], 1])
for window in sliding_windows:
if window[1]/float(window[2]) > .1:
good = False
results[name] = good
</code></pre>
<p>这将生成开始日期<code>sliding_windows</code>的列表:</p>
<pre><code>[[40283, 3, 35], [40317, 3, 34], [40350, 3, 33], [40374, 3, 32],
[40408, 3, 31], [40437, 2, 30], [40465, 1, 29], [40505, 1, 28],
[40521, 1, 27], [40569, 1, 26], [40597, 1, 25], [40619, 1, 24],
[40647, 1, 23], [40681, 1, 22], [40710, 1, 21], [40738, 1, 20],
[40772, 1, 19], [40801, 1, 18], [40822, 0, 17], [40980, 0, 16],
[41011, 0, 15], [41045, 0, 14], [41067, 0, 13], [41228, 0, 12],
[41388, 0, 11], [41409, 0, 10], [41438, 0, 9], [41466, 0, 8],
[41557, 0, 7], [41592, 0, 6], [41710, 0, 5], [41743, 0, 4],
[41773, 0, 3], [41802, 0, 2], [41834, False, 1]]
</code></pre>
<p>并计算每个windows速率,如果低于/超过,则在字典中返回True/False。不包括时间跨度不够的窗口可能是值得的,因为在这种情况下,最后10次测量中的任何命中都将被视为失败。我可能会做最后一次测量,扔掉所有短于5年的窗口(除了第一次,所以如果5年以下的数据可用,你可以得到部分结果):</p>
<pre><code>cutoff = int(value[-1][0]) - range
for tuple in value:
...
if int(tuple[0]) < cutoff or len(sliding_windows) == 0:
sliding_windows.append([int(tuple[0]), tuple[1], 1])
</code></pre>
<p>然后生成:</p>
<p><code>sliding_windows</code>:</p>
<p><code>[[40283, 3, 35]]</code></p>
<p>注意,如果好的话返回<code>True</code>,如果坏的话返回<code>False</code>:</p>
<p><code>{'Escherichia coli': True}</code></p>
<p>注意:这是通过将布尔值<code>True</code>/<code>False</code>加在一起<code>1</code>/<code>0</code>隐式转换为<code>1</code>/<code>window[1] += tuple[1]</code>。这就是为什么最后一个条目是<code>[41834, False, 1]</code>,就我们的目的而言,它相当于<code>[41834, 0, 1]</code>。你知道吗</p>