使用Python随机从列表中提取x个项

9 投票

3 回答

13924 浏览

提问于 2025-04-18 05:20

假设我们有两个列表，比如：

lstOne = [ '1', '2', '3', '4', '5', '6', '7', '8', '9', '10']
lstTwo = [ '1', '2', '3', '4', '5', '6', '7', '8', '9', '10']

我想让用户输入他们想提取多少个项目，这个数量是根据整个列表长度的百分比来计算的，并且要从每个列表中随机提取相同的索引。例如，如果我想要50%的数据，输出结果会是：

newLstOne = ['8', '1', '3', '7', '5']
newLstTwo = ['8', '1', '3', '7', '5']

我用以下代码实现了这个功能：

from random import randrange

lstOne = [ '1', '2', '3', '4', '5', '6', '7', '8', '9', '10']
lstTwo = [ '1', '2', '3', '4', '5', '6', '7', '8', '9', '10']

LengthOfList = len(lstOne)
print LengthOfList

PercentageToUse = input("What Percentage Of Reads Do you want to extract? ")
RangeOfListIndices = []

HowManyIndicesToMake = (float(PercentageToUse)/100)*float(LengthOfList)
print HowManyIndicesToMake

for x in lstOne:
    if len(RangeOfListIndices)==int(HowManyIndicesToMake):
        break
    else:
        random_index = randrange(0,LengthOfList)
        RangeOfListIndices.append(random_index)

print RangeOfListIndices


newlstOne = []
newlstTwo = []

for x in RangeOfListIndices:
    newlstOne.append(lstOne[int(x)])
for x in RangeOfListIndices:
    newlstTwo.append(lstTwo[int(x)])

print newlstOne
print newlstTwo

但我在想，是否有更高效的方法来做到这一点，因为在我的实际应用中，我需要从145,000个项目中进行抽样。此外，randrange在这个规模下是否足够随机，没有偏差？

谢谢！

用户输入性能优化列表操作数据处理随机数生成随机抽样数据抽样

3 个回答

你现在的做法看起来大致是对的。

如果你想避免多次选择同一个对象，可以试试下面的方法：

a = len(lstOne)
choose_from = range(a)          #<--- creates a list of ints of size len(lstOne)
random.shuffle(choose_from)
for i in choose_from[:a]:       # selects the desired number of items from both original list
    newlstOne.append(lstOne[i]) # at the same random locations & appends to two newlists in
    newlstTwo.append(lstTwo[i]) # sequence

回答于 2025-04-18 由 Python大师

分享举报

只需要把你的两个列表用 zip 函数合并在一起，然后用 random.sample 来进行抽样，最后再用 zip 函数把它们转回成两个列表。

import random

_zips = random.sample(zip(lstOne,lstTwo), 5)

new_list_1, new_list_2 = zip(*_zips)

示例：

list_1 = range(1,11)
list_2 = list('abcdefghij')

_zips = random.sample(zip(list_1, list_2), 5)

new_list_1, new_list_2 = zip(*_zips)

new_list_1
Out[33]: (3, 1, 9, 8, 10)

new_list_2
Out[34]: ('c', 'a', 'i', 'h', 'j')

回答于 2025-04-18 由 Python大师

分享举报

问： 我想让用户输入他们想提取多少个项目，作为整体列表长度的百分比，并且从每个列表中随机提取相同的索引。

答：最简单的方法就是直接按照你的要求来做：

 percentage = float(raw_input('What percentage? '))
 k = len(data) * percentage // 100
 indicies = random.sample(xrange(len(data)), k)
 new_list1 = [list1[i] for i in indicies]
 new_list2 = [list2[i] for i in indicies]

问： 在我的实际使用案例中，这是从145,000个项目中进行抽样。此外，在这个规模下，randrange是否足够没有偏差？

答：在Python 2和Python 3中，random.randrange()函数完全消除了偏差（它使用内部的_randbelow()方法，进行多次随机选择，直到找到没有偏差的结果）。

在Python 2中，random.sample()函数有一点偏差，但仅仅是在53位的最后一位的四舍五入上。在Python 3中，random.sample()函数使用内部的_randbelow()方法，因此没有偏差。

回答于 2025-04-18 由 Python大师

分享举报

使用Python随机从列表中提取x个项

3 个回答

撰写回答