从lis创建随机分组

2条回答

网友

1楼 · 编辑于 2024-04-25 22:36:40

我要做的第一件事是过滤成两个列表，每个性别一个：

males = [d for d in data if d.Gender == 'm']
females = [d for d in data if d.Gender == 'f']

接下来，重新排列列表的顺序，以便更容易选择“随机”而实际上不必选择随机索引：

random.shuffle(males)
random.shuffle(females)

然后，选择元素，同时尽量与性别比例保持或多或少的一致：

# establish number of groups, and size of each group
GROUP_SIZE = 15
GROUP_NUM = math.ceil(len(data) / group_size)
# make an empty list of groups to add each group to
groups = []
while len(groups) < GROUP_NUM and (len(males) > 0 and len(females) > 0):
    # calculate the proper gender ratio, to perfectly balance this group
    num_males = len(males) / len(data) * GROUP_SIZE
    num_females = GROUP_SIZE - num_males
    # select that many people from the previously-shuffled lists
    males_in_this_group = [males.pop(0) for n in range(num_males) if len(males) > 0]
    females_in_this_group = [males.pop(0) for n in range(num_females) if len(females) > 0]
    # put those two subsets together, shuffle to make it feel more random, and add this group
    this_group = males_in_this_group + females_in_this_group
    random.shuffle(this_group)
    groups.append(this_group)

这将确保每组的性别比例尽可能与原始样本相符。最后一组当然会比其他组小，并且会包含其他组的“剩余部分”。你知道吗

网友

2楼 · 编辑于 2024-04-25 22:36:40

使用pandas的方法意味着由15个成员组成的小组。其余的在最后一组。性别比例的准确度和随机抽样法所允许的差不多。你知道吗

import pandas as pd

df = pd.read_csv('1.csv', skipinitialspace=True) # 1.csv contains sample data from the question

# shuffle data / pandas way
df = df.sample(frac=1).reset_index(drop=True)

# group size
SIZE = 15

# create column with group number
df['group'] = df.index // SIZE

# list of groups, groups[0] is dataframe with the first group members
groups = [
    df[df['group'] == num]
    for num in range(df['group'].max() + 1)]

将数据帧保存到文件：

# one csv-file
df.to_csv('2.csv')

# many csv-files
for num, group_df in enumerate(groups, 1):
    group_df.to_csv('group_{}.csv'.format(num))

相关问题更多 >

编程相关推荐

热门问题

热门文章

从lis创建随机分组

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >