如何将两个pandas DataFrame列的列表项关联起来?
我有一个数据表(DataFrame),里面有两列,都是由列表组成的。
一列是某些项目的类别,另一列是与这些类别相关的分数。
import pandas as pd
cat = [
['speed', 'health', 'strength', 'health'],
['strength', 'speed', 'speed'],
['strength', 'speed', 'health', 'speed']
]
pts = [
[1, 2, 1.5, -1],
[2, -1.5, 1.5],
[-1, 2, 0, 1.5]
]
s_cat = pd.Series(cat, name='cat')
s_pts = pd.Series(pts, name='pts')
df = pd.concat([s_cat, s_pts], axis=1)
输出结果:
cat pts
0 [speed, health, strength, health] [1, 2, 1.5, -1]
1 [strength, speed, speed] [2, -1.5, 1.5]
2 [strength, speed, health, speed] [-1, 2, 0, 1.5]
我想把这些列表关联起来,计算每个类别的分数总和,并且如果可以的话,还想统计每个类别的正分和负分的数量。
我希望得到的结果大概是这样的:
cat pts speed_sum health_sum strength_sum speed_pos speed_neg health_pos health_neg strength_pos strength_neg
0 [speed, health, strength, health] [1, 2, 1.5, -1] 1.0 1.0 1.5 1 0 1 1 1 0
1 [strength, speed, speed] [2, -1.5, 1.5] 0.0 NaN 2.0 1 1 0 0 1 0
2 [strength, speed, health, speed] [-1, 2, 0, 1.5] 3.5 0.0 -1.0 2 0 0 0 0 1
2 个回答
0
试试这个:
# Function to calculate sum of points for each category
def sum_points(row):
speed_sum = 0
health_sum = 0
strength_sum = 0
speed_pos = 0
speed_neg = 0
health_pos = 0
health_neg = 0
strength_pos = 0
strength_neg = 0
for i, category in enumerate(row['cat']):
if category == 'speed':
if row['pts'][i] > 0:
speed_pos += 1
elif row['pts'][i] < 0:
speed_neg += 1
speed_sum += row['pts'][i]
elif category == 'health':
if row['pts'][i] > 0:
health_pos += 1
elif row['pts'][i] < 0:
health_neg += 1
health_sum += row['pts'][i]
elif category == 'strength':
if row['pts'][i] > 0:
strength_pos += 1
elif row['pts'][i] < 0:
strength_neg += 1
strength_sum += row['pts'][i]
return pd.Series([speed_sum, health_sum, strength_sum, speed_pos, speed_neg, health_pos, health_neg, strength_pos, strength_neg],
index=['speed_sum', 'health_sum', 'strength_sum', 'speed_pos', 'speed_neg', 'health_pos', 'health_neg', 'strength_pos', 'strength_neg'])
# Apply the function to each row
df[['speed_sum', 'health_sum', 'strength_sum', 'speed_pos', 'speed_neg', 'health_pos', 'health_neg', 'strength_pos', 'strength_neg']] = df.apply(sum_points, axis=1)
df
3
试试这个:
def pos(vals):
return (vals > 0).sum()
def neg(vals):
return (vals < 0).sum()
tmp = df.explode(["cat", "pts"])
tmp = tmp.pivot_table(
index=tmp.index,
columns="cat",
values="pts",
aggfunc=["sum", pos, neg],
)
tmp.columns = [f"{b}_{a}" for a, b in tmp.columns]
out = pd.concat([df, tmp], axis=1)
print(out)
输出结果是:
cat pts health_sum speed_sum strength_sum health_pos speed_pos strength_pos health_neg speed_neg strength_neg
0 [speed, health, strength, health] [1, 2, 1.5, -1] 1 1 1.5 1.0 1.0 1.0 1.0 0.0 0.0
1 [strength, speed, speed] [2, -1.5, 1.5] NaN 0.0 2 NaN 1.0 1.0 NaN 1.0 0.0
2 [strength, speed, health, speed] [-1, 2, 0, 1.5] 0 3.5 -1 0.0 2.0 0.0 0.0 0.0 1.0