Pandas每行随机交换列值

df y ch1_g1 ch2_g1 ch3_g1 ch1_g2 ch2_g2 ch3_g2 0 20 89 62 23 3 74 1 51 64 19 2 83 0 0 14 58 2 71 31 48 1 32 28 2 30 92 91 1 51 36 51 66 15 14 ...

for index,row in df.iterrows(): choice = np.random.choice([0,1]) if row['y'] != choice: df.loc[index, 'y'] = choice for column in df.columns[1:]: key = column.replace('g1', 'g2') if 'g1' in column else column.replace('g2', 'g1') df.loc[index, column] = row[key]

1条回答

网友

1楼 · 发布于 2024-04-20 02:02:42

不管交换列是否解决了类不平衡问题，我都会交换整个数据集，并在原始数据集和交换数据集之间随机选择：

# Step 1: swap the columns
df1 = pd.concat((df.filter(regex='[^(_g1)]$'),
                 df.filter(regex='_g1$')),
                axis=1)

# Step 2: rename the columns
df1.columns = df.columns

# random choice
np.random.seed(1)
is_original = np.random.choice([True,False], size=len(df))

# concat to make new dataset
pd.concat((df[is_original],df1[~is_original]))

输出：

   y  ch1_g1  ch2_g1  ch3_g1  ch1_g2  ch2_g2  ch3_g2
2  0      14      58       2      71      31      48
3  1      32      28       2      30      92      91
0  0      23       3      74      20      89      62
1  1       2      83       0      51      64      19
4  1      66      15      14      51      36      51

请注意，索引为1,4的行与g2进行了g1交换

相关问题更多 >

编程相关推荐

热门问题

热门文章