如何将两个pandas DataFrame列的列表项关联起来?

2 投票
2 回答
33 浏览
提问于 2025-04-14 16:55

我有一个数据表(DataFrame),里面有两列,都是由列表组成的。

一列是某些项目的类别,另一列是与这些类别相关的分数。

import pandas as pd

cat = [
    ['speed', 'health', 'strength', 'health'],
    ['strength', 'speed', 'speed'],
    ['strength', 'speed', 'health', 'speed']
]

pts = [
    [1, 2, 1.5, -1],
    [2, -1.5, 1.5],
    [-1, 2, 0, 1.5]
]

s_cat = pd.Series(cat, name='cat')
s_pts = pd.Series(pts, name='pts')

df = pd.concat([s_cat, s_pts], axis=1)

输出结果:

                                 cat              pts
0  [speed, health, strength, health]  [1, 2, 1.5, -1]
1           [strength, speed, speed]   [2, -1.5, 1.5]
2   [strength, speed, health, speed]  [-1, 2, 0, 1.5]

我想把这些列表关联起来,计算每个类别的分数总和,并且如果可以的话,还想统计每个类别的正分和负分的数量。

我希望得到的结果大概是这样的:

                                 cat              pts  speed_sum  health_sum  strength_sum  speed_pos  speed_neg  health_pos  health_neg  strength_pos  strength_neg
0  [speed, health, strength, health]  [1, 2, 1.5, -1]        1.0         1.0           1.5          1          0           1           1             1             0
1           [strength, speed, speed]   [2, -1.5, 1.5]        0.0         NaN           2.0          1          1           0           0             1             0
2   [strength, speed, health, speed]  [-1, 2, 0, 1.5]        3.5         0.0          -1.0          2          0           0           0             0             1

2 个回答

0

试试这个:

# Function to calculate sum of points for each category
def sum_points(row):
    speed_sum = 0
    health_sum = 0
    strength_sum = 0
    speed_pos = 0
    speed_neg = 0
    health_pos = 0
    health_neg = 0
    strength_pos = 0
    strength_neg = 0
    
    for i, category in enumerate(row['cat']):
        if category == 'speed':
            if row['pts'][i] > 0:
                speed_pos += 1
            elif row['pts'][i] < 0:
                speed_neg += 1
            speed_sum += row['pts'][i]
        elif category == 'health':
            if row['pts'][i] > 0:
                health_pos += 1
            elif row['pts'][i] < 0:
                health_neg += 1
            health_sum += row['pts'][i]
        elif category == 'strength':
            if row['pts'][i] > 0:
                strength_pos += 1
            elif row['pts'][i] < 0:
                strength_neg += 1
            strength_sum += row['pts'][i]
    
    return pd.Series([speed_sum, health_sum, strength_sum, speed_pos, speed_neg, health_pos, health_neg, strength_pos, strength_neg],
                     index=['speed_sum', 'health_sum', 'strength_sum', 'speed_pos', 'speed_neg', 'health_pos', 'health_neg', 'strength_pos', 'strength_neg'])

# Apply the function to each row
df[['speed_sum', 'health_sum', 'strength_sum', 'speed_pos', 'speed_neg', 'health_pos', 'health_neg', 'strength_pos', 'strength_neg']] = df.apply(sum_points, axis=1)

df
3

试试这个:

def pos(vals):
    return (vals > 0).sum()


def neg(vals):
    return (vals < 0).sum()


tmp = df.explode(["cat", "pts"])
tmp = tmp.pivot_table(
    index=tmp.index,
    columns="cat",
    values="pts",
    aggfunc=["sum", pos, neg],
)
tmp.columns = [f"{b}_{a}" for a, b in tmp.columns]

out = pd.concat([df, tmp], axis=1)
print(out)

输出结果是:

                                 cat              pts health_sum speed_sum strength_sum  health_pos  speed_pos  strength_pos  health_neg  speed_neg  strength_neg
0  [speed, health, strength, health]  [1, 2, 1.5, -1]          1         1          1.5         1.0        1.0           1.0         1.0        0.0           0.0
1           [strength, speed, speed]   [2, -1.5, 1.5]        NaN       0.0            2         NaN        1.0           1.0         NaN        1.0           0.0
2   [strength, speed, health, speed]  [-1, 2, 0, 1.5]          0       3.5           -1         0.0        2.0           0.0         0.0        0.0           1.0

撰写回答