如何根据多个条件在pandas数据框中进行子集筛选

Base G Pref Sier Val Other latest_class d_id 0 2 0 0 12 0 Val 38 12 0 0 0 0 0 Base 39 0 0 12 0 0 0 Pref 40 0 0 0 12 0 0 Sier 41 0 0 0 12 0 0 Sier 42 12 0 0 0 0 0 Base 43 0 0 0 0 0 12 Other 45 0 0 0 0 0 12 Other 46 0 12 0 0 0 0 G 47 0 0 12 0 0 0 Pref 48 0 0 0 0 0 12 Other 51 0 0 8 5 0 0 Sier 53 0 0 0 0 12 0 Val 54 0 0 0 0 12 0 Val 55

i = np.arange(len(device_class)) j = (device_class.columns[:-1].values[:, None] == device_class.latest_class.values).argmax(0) device_class_latest = device_class.iloc[np.flatnonzero(device_class.values[i,j] >= 3)]

1条回答

网友

1楼 · 发布于 2024-06-16 09:41:13

我不太确定我是否正确地理解了你的数据结构。我假设前6列中的值是某人在班上的月数？如果是，请尝试以下解决方案：

import pandas as pd

data = {
    'Base': [0, 12, 0, 0, 0, 12, 0, 0, 0, 0, 0, 0, 0, 0],
    'G': [2, 0, 0, 0, 0, 0, 0, 0, 12, 0, 0, 0, 0 ,0],
    'Pref': [0, 0, 12, 0, 0, 0, 0, 0, 0, 12, 0, 8, 0, 0],
    'Sier': [0, 0, 0, 12, 12, 0, 0, 0, 0, 0, 0, 5, 0, 0],
    'Val': [12, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 12, 12],
    'Other': [0, 0, 0, 0, 0, 0, 12, 12, 0, 0, 12, 0, 0 ,0],
    'latest_class': [
        'Val', 'Base', 'Pref', 'Sier', 'Sier', 'Base', 'Other', 'Other', 'G',
        'Pref', 'Other', 'Sier', 'Val','Val'
    ],
    'd_id': [38, 39, 40, 41, 42, 45, 45, 46, 47, 48, 51, 53, 54, 55]
}

# Load data into DataFrame
df = pd.DataFrame(data)

# Remove records where latest class is Other
df = df[df['latest_class'] != 'Other']

# Filter out records with > 1 class
months_df = df.drop(['latest_class', 'd_id'], axis=1)
months_multiple = months_df[months_df > 0].count(axis=1)
months_1_only = months_multiple == 1
df = df.loc[months_1_only, :]

# Get records where months of latest_class >= 3
rows_to_keep = []
for index, row in df.iterrows():
    latest_class = row['latest_class']
    months_spent = row[latest_class]
    gte_3 = True if months_spent >= 3 else False
    rows_to_keep.append(gte_3)
df = df.iloc[rows_to_keep, :]

# Get them back in the original order (if needed)
df = df[['Base', 'G', 'Pref', 'Sier', 'Val', 'Other', 'latest_class', 'd_id']]
print(df)

输出如您所愿：

^{pr2}$

请注意，为了清楚地标识每个步骤，我已经过多地赘述了，但是您可以将这些行组合在一起以创建一个更简洁的脚本。在

另外，final filter可以定义为一个函数，并使用Pandasapply方法而不是使用iterrows来应用。在

相关问题更多 >

编程相关推荐

热门问题

热门文章