使用映射创建新列时处理性能警告

2024-06-17 12:22:15 发布

您现在位置:Python中文网/ 问答频道 /正文

完全错误:

"PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling frame.insert many times, which has poor performance. Consider using pd.concat instead. To get a de-fragmented frame, use newframe = frame.copy() payouts[x] = ranking[x].map(prizes.set_index('Rank')['Payout'].to_dict())"

lineups = range(1, 5)
prizes = {'Rank':[1, 2, 3], 'Payout':[100, 50, 25]}
prizes = pd.DataFrame(prizes)
payouts = pd.DataFrame(lineups, columns=['Lineup'])

ranking = {'Lineup':[1, 2, 3, 4], 1:[1, 2 , 3, 4], 2:[2, 1, 4, 3], 3:[4, 1, 2, 3], 4:[1, 3, 4, 2]}
ranking = pd.DataFrame(ranking)

for x in range(1, 4):
     payouts[x] = ranking[x].map(prizes.set_index('Rank')['Payout'].to_dict())

payouts = payouts.fillna(-20)

Tags: tomapdataframeindexisframedictpd
1条回答
网友
1楼 · 发布于 2024-06-17 12:22:15

代替循环,我们可以创建一个mapper然后^{}{a2}到ranking然后^{}中的每一列,并使用payouts

mapper = prizes.set_index('Rank')['Payout'].to_dict()
payouts = pd.concat(
    [payouts,
     ranking[range(1, 5)].apply(lambda s: s.map(mapper)).fillna(-20)],
    axis=1
)

或者,我们可以^{}^{}其中值超出最大奖金等级:

mapper = prizes.set_index('Rank')['Payout'].to_dict()
payouts = pd.concat(
    [payouts,
     ranking[range(1, 5)].replace(mapper)
         .mask(ranking.gt(prizes['Rank'].max()), -20)],
    axis=1
)

两者都产生payouts

   Lineup    1    2    3    4
0       1  100   50  -20  100
1       2   50  100  100   25
2       3   25  -20   50  -20
3       4  -20   25   25   50

*注意:在本例中,排名包含在不初始化payouts的情况下构建数据帧的必要信息:

mapper = prizes.set_index('Rank')['Payout'].to_dict()
payouts = ranking.copy()  # Create copy of ranking
cols = list(range(1, 5))
payouts[cols] = payouts[cols].apply(lambda s: s.map(mapper)).fillna(-20)

mapper = prizes.set_index('Rank')['Payout'].to_dict()
payouts = ranking.copy()  # Create copy of ranking
cols = list(range(1, 5))
payouts[cols] = (
    payouts[cols].replace(mapper).mask(ranking.gt(prizes['Rank'].max()), -20)
)

数据帧和导入:

import pandas as pd

prizes = pd.DataFrame({'Rank': [1, 2, 3], 'Payout': [100, 50, 25]})
payouts = pd.DataFrame({'Lineup': range(1, 5)})
ranking = pd.DataFrame({
    'Lineup': [1, 2, 3, 4],
    1: [1, 2, 3, 4],
    2: [2, 1, 4, 3],
    3: [4, 1, 2, 3],
    4: [1, 3, 4, 2]
})

相关问题 更多 >