从列表中抽取指定数量的样本。使用所有列表元素

2024-04-18 16:02:30 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个元素列表。现在,我想指定从该列表中提取的绘图/采样数。不过,我必须确保

(i)所有样品包括所有原始元素

(ii)每个样本的样本大小不应相同

One update to my original question

UPDATE (iii) the minimum sample size is 2

例如:

list = [1,2,3,4,5,6,7,8,9,10]
draws = 4
samples = some_function(draws,list)
set(tuple(row) for row in sample) == set(list) # must be true

samples =[[1,2,3],[4,5],[6,7,8],[9,10]]#4个绘图,一起包括所有元素,两个不同的样本大小,最小样本大小>;二,

问题:有没有一种简单的方法可以使用例如numpy.random这样做?**

下面是一个使用np.random.permutationnp.random.choice的尝试。但是,这种方法并不总是在最终样本中包含所有列表元素

srch_list = list(range(100))
draws = 10
mid = round(len(srch_list)/draws)
n_leafs = range(mid-2,mid+3)

rnd_list = np.random.permutation(srch_list)
leafs = []
for i in range(draws):
    idx = np.random.choice(n_leafs)
    leafs.append(rnd_list[:idx])
    rnd_list = rnd_list[idx:]



Tags: sample元素绘图列表nprangerandomlist
3条回答

一种方法是:

import numpy as np

np.random.seed(1)

l = [1,2,3,4,5,6,7,8,9,10]

ids = np.concatenate(([0],
                     np.random.choice(range(1, len(l)-1), 3, replace=False),
                     [len(l)]))

ids = np.sort(ids)

chunks = [l[i:j] for i,j in zip(ids[:-1], ids[1:])]

chunks
[[1, 2], [3], [4, 5, 6, 7, 8], [9, 10]]

现在,如果还需要洗牌列表中的元素,可以使用numpy.random.shuffle

np.random.shuffle(l)
chunks = [l[i:j] for i,j in zip(ids[:-1], ids[1:])]

chunks
[[5, 9], [3], [10, 1, 6, 8, 7], [2, 4]]

根据第一个答案(由FBruzzesi),我提出了以下解决方案:

def _sample_leaf_combinations(l:list,draws=10, minchunk=2):

    ldraw = list(range(minchunk,len(l)-1)[::minchunk])[:-1] # list to draw indices from. Note: deletes some items in order to ensure that distance between indices is at least minchunk
    if len(ldraw) <= draws -1:
        raise ValueError(f"Cannot make {draws} draws from list of {len(l)} with minchunk of {minchunk}. Consider lowering minchunk")


    ids = np.concatenate(([0],np.random.choice(ldraw, draws-1, replace=False),[len(l)]))
    ids = np.sort(ids)
    chunks = [l[i:j] for i,j in zip(ids[:-1], ids[1:])]

    return chunks

谢谢你的帮助

下面是另一个解决方案:

import numpy as np


def draw_samples(pool, nsamples, min_sample_size=1):
    # make sure pool is an array for the logic to work
    if not isinstance(pool, np.ndarray):
        pool = np.array(pool)

    # fist determine the total amount of samples to be drawn from pool
    min_total_n_elements = len(pool) if len(pool) > nsamples*min_sample_size \
        else nsamples*min_sample_size
    max_total_n_elements = min_total_n_elements + 5  # the sky is the limit
    total_n_elements = np.random.randint(
        min_total_n_elements, max_total_n_elements
    )
    additional_n_elements = total_n_elements - min_total_n_elements

    # then extend the pool the samples are going to be drawn from
    extended_pool = np.append(
        pool, np.random.choice(pool, size=additional_n_elements)
    ) if additional_n_elements else pool

    # assign each element in the pool to a sample
    assignment = np.array(list(np.arange(nsamples))*min_sample_size)
    if total_n_elements - len(assignment):
        assignment = np.append(
            assignment, np.random.choice(
                np.arange(nsamples), size=total_n_elements - len(assignment)
            )
        )
    np.random.shuffle(assignment)
    samples = [extended_pool[assignment == i] for i in range(nsamples)]

    return samples


lst = np.arange(10)
n_subsamples = 4
samples = draw_samples(lst, n_subsamples, min_sample_size=2)
print(set.union(*[set(sample) for sample in samples]) == set(lst))

相关问题 更多 >