擅长:python、mysql、java
<p>这是一个用python实现的简单算法,它可以检查一个值是否与一个集群的平均值相差太远(以标准偏差计):</p>
<pre><code>from math import sqrt
def stat(lst):
"""Calculate mean and std deviation from the input list."""
n = float(len(lst))
mean = sum(lst) / n
stdev = sqrt((sum(x*x for x in lst) / n) - (mean * mean))
return mean, stdev
def parse(lst, n):
cluster = []
for i in lst:
if len(cluster) <= 1: # the first two values are going directly in
cluster.append(i)
continue
mean,stdev = stat(cluster)
if abs(mean - i) > n * stdev: # check the "distance"
yield cluster
cluster[:] = [] # reset cluster to the empty list
cluster.append(i)
yield cluster # yield the last cluster
</code></pre>
<p>这将使用<code>5 < n < 9</code>返回您在示例中期望的结果:</p>
<pre><code>>>> array = [1, 2, 3, 60, 70, 80, 100, 220, 230, 250]
>>> for cluster in parse(array, 7):
... print(cluster)
[1, 2, 3]
[60, 70, 80, 100]
[220, 230, 250]
</code></pre>