我想根据特定条件从数据集中的值创建一个列表

2024-05-15 11:00:09 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在使用一个数据集,其中包含了1985年以来每场三月疯狂比赛的信息。我想知道哪些球队赢得了所有比赛,以及每场赢了多少次

我屏蔽了主数据集并创建了一个只包含冠军赛信息的新数据集。现在我正在尝试创建一个循环,比较两支参加冠军赛的球队的得分,检测胜利者并将该球队添加到列表中。数据集是这样的:https://imgur.com/tXhPYSm

tourney = pd.read_csv('ncaa.csv')

champions = tourney.loc[tourney['Region Name'] == "Championship", ['Year','Seed','Score','Team','Team.1','Score.1','Seed.1']]

list_champs = []

for i in champions:
    if champions['Score'] > champions['Score.1']:
        list_champs.append(i['Team'])
    else:
        list_champs.append(i['Team.1'])

Tags: csv数据信息列表teamlist屏蔽score
2条回答

为什么需要循环遍历DataFrame

基本过滤应该很好。像这样:

champs1 = champions.loc[champions['Score'] > champions['Score.1'], 'Team']
champs2 = champions.loc[champions['Score'] < champions['Score.1'], 'Team.1']

list_champs = list(champs1) + list(champs2)

使代码正常工作的最简单的更改(不是最有效的):

tourney = pd.read_csv('ncaa.csv')

champions = tourney.loc[tourney['Region Name'] == "Championship", ['Year','Seed','Score','Team','Team.1','Score.1','Seed.1']]

list_champs = []

for row in champions.iterrows():
    if row['Score'] > row['Score.1']:
        list_champs.append(row['Team'])
    else:
        list_champs.append(row['Team.1'])

否则,您可以简单地执行以下操作:

df.apply(lambda row: row['Team'] if row['Score'] > row['Score.1'] else row['Team.1'], axis=1).values

相关问题 更多 >