Python中的统计自助法库?
在Python
中有没有统计学的自助法库呢?
我想要的功能跟R bootstrap
里提供的差不多:
http://statistics.ats.ucla.edu/stat/r/library/bootstrap.htm
我搜索了一下,发现了:
http://mjtokelly.blogspot.com/2006/04/bootstrap-statistics-in-python.html(不过这个链接里的代码坏掉了)
http://adorio-research.org/wordpress/?p=9048
https://github.com/cgevans/scikits-bootstrap
但是上面这些似乎没有提供所有的功能(特别是概率权重)。
有没有其他的建议呢?
最近这个功能被添加到了numpy.random里。
谢谢!
1 个回答
4
如果你只是想要一个Python版本的R语言中的sample函数,可以试试这个:
import collections
import random
import bisect
def sample(xs, sample_size = None, replace=False, sample_probabilities = None):
"""Mimics the functionality of http://statistics.ats.ucla.edu/stat/r/library/bootstrap.htm sample()"""
if not isinstance(xs, collections.Iterable):
xs = range(xs)
if not sample_size:
sample_size = len(xs)
if not sample_probabilities:
if replace:
return [random.choice(xs) for _ in range(sample_size)]
else:
return random.sample(xs, sample_size)
else:
if replace:
total, cdf = 0, []
for x, p in zip(xs, sample_probabilities):
total += p
cdf.append(total)
return [ xs[ bisect.bisect(cdf, random.uniform(0, total)) ]
for _ in range(sample_size) ]
else:
assert len(sample_probabilities) == len(xs)
xps = list(zip(xs, sample_probabilities))
total = sum(sample_probabilities)
result = []
for _ in range(sample_size):
# choose an item based on weights, and remove it from future iterations.
# this is slow (N^2), a tree structure for xps would be better (NlogN)
target = random.uniform(0, total)
current_total = 0
for index, (x,p) in enumerate(xps):
current_total += p
if current_total > target:
xps.pop(index)
result.append(x)
total -= p
break
return result