使用其中一个数据帧作为键将Python中的数据帧组合到字典

2024-04-29 05:18:20 发布

您现在位置:Python中文网/ 问答频道 /正文

我有3个数据帧,包含每日数据:唯一代码、名称、分数。第1行的第一个值叫做Rank,然后我有日期,Rank下的第一列包含秩号(第一列用作索引)。你知道吗

**df1** UNIQUE CODES

Rank 12/8/2017 12/9/2017 .... 1/3/2018
1     Code_1      Code_3       Code_4
2     Code_2      Code_1       Code_2
...
1000  Code_5      Code_6       Code_7

**df2** NAMES

Rank 12/8/2017 12/9/2017 .... 1/3/2018
1     Jon        Maria         Peter
2     Brian      Jon           Maria
...
1000  Chris      Tim           Charles

**df3** SCORES

Rank 12/8/2017 12/9/2017 .... 1/3/2018
1     10           20           30
2     15           10           40
...
1000  25           15           20

所需输出:

我想将这些数据帧组合到一个字典中,使用df1代码名作为键,所以它看起来像这样:

dictionary = {'Code_1':[Jon, 20] , 'Code_2':[Brian, 15]}

由于有重复的竞争对手,我将需要在所有的数据系列总和他们的得分。因此,在上述示例中,Jon的分数_1将包含2017年8月12日和2017年9月12日的分数。你知道吗

有1000行26列+索引,所以需要一种方法来捕获它们。我认为嵌套循环可以在这里工作,但是没有足够的经验来构建一个可以工作的循环。你知道吗

最后,我想把这本词典按最高分排序。请对此提出任何解决方案或更直接的方法来组合这些数据并获得分数排名。你知道吗

我附上了数据帧的图片,包括名字、代码和分数。你知道吗

names

codes

scores

我在下面的3个数据帧上使用了建议的解决方案。请注意,hashtags代表代码,players代表名字,奖杯代表分数:

# reshape to get dates into rows
hashtags_reshaped = pd.melt(hashtags, id_vars = ['Rank'], 
                   value_vars = hashtags.columns, 
                   var_name = 'Date', 
                   value_name = 'Code').drop('Rank', axis = 1)

# reshape to get dates into rows
players_reshaped = pd.melt(players, id_vars = ['Rank'], 
                   value_vars = hashtags.columns, 
                   var_name = 'Date', 
                   value_name = 'Name').drop('Rank', axis = 1)

# reshape to get the dates into rows
trophies_reshaped = pd.melt(trophies, id_vars = ['Rank'], 
                   value_vars = hashtags.columns, 
                   var_name = 'Date', 
                   value_name = 'Score').drop('Rank', axis = 1)

# merge the three together. 
# This _assumes_ that the dfs are all in the same order and that all the data matches up.
merged_df = pd.DataFrame([hashtags_reshaped['Date'], 
hashtags_reshaped['Code'], players_reshaped['Name'], 
trophies_reshaped['Score']]).T
print(merged_df)

# group by code, name, and date; sum the scores together if multiple exist for a given code-name-date grouping
grouped_df = merged_df.groupby(['Code', 'Name', 'Date']).sum().sort_values('Score', ascending = False)
print(grouped_df)

summed_df = merged_df.drop('Date', axis = 1) \
.groupby(['Code', 'Name']).sum() \
.sort_values('Score', ascending = False).reset_index()
summed_df['li'] = list(zip(summed_df.Name, summed_df.Score))
print(summed_df)

但我得到了一个奇怪的结果:总分应该是几百或几千(平均分是200-300,平均参与频率是4-6倍)。我得到的分数差得很远,但他们的匹配码和名字是正确的。你知道吗

汇总的数据框:

0       (MandiBralaX, 996871590076253)  
1              (Arso_C, 9955130513430)  
2               (ThatRainbowGuy, 9946)  
3                         (fabi, 9940)  
4                      (Dogão, 991917)  
5                      (Hierbo, 99168)  
6               (Clyde, 9916156180128)  
7      (.A.R.M.I.N., 9916014310187143)  
8                (keftedokofths, 9900)  
9                   (⚽AngelSosa⚽, 990)  
10                       (Totoo98, 99)  

分组数据框:

          Code             Name             Score  \
0       #JL2J02LY      MandiBralaX   996871590076253   
1       #80JQ90VC           Arso_C     9955130513430   
2       #9GGC2CUQ   ThatRainbowGuy              9946   
3       #8LL989QV             fabi              9940   
4        #9PPC89L            Dogão            991917   
5      #2JPLQ8JP8           Hierbo             99168

Tags: the数据namedfdatevaluecodevars
1条回答
网友
1楼 · 发布于 2024-04-29 05:18:20

这会让你有更多的路要走。我没有按照您指定的那样在末尾创建字典;虽然您可能需要这种格式,但最终会得到嵌套的字典或列表,因为每个代码都有一个名称,但可能有许多日期和分数与之关联。你想要怎样的录音单、录音等?你知道吗

下面的代码返回一个分组的数据帧;您可以将其直接输出到dict(如图所示),但是您可能需要详细指定格式,尤其是在需要有序字典的情况下。(字典本来就不是有序的;如果您真的需要一个有序的字典,您必须from collections import OrderedDict并查看文档。你知道吗

import pandas as pd

#create the dfs; note that 'Code' is set up as a string
df1 = pd.DataFrame({'Rank': [1, 2], '12/8/2017': ['1', '2'], '12/9/2017': ['3', '1']})
df1.set_index('Rank', inplace = True)

# reshape to get dates into rows
df1_reshaped = pd.melt(df1, id_vars = ['Rank'], 
                       value_vars = df1.columns, 
                       var_name = 'Date', 
                       value_name = 'Code').drop('Rank', axis = 1)
#print(df1_reshaped)

# create the second df
df2 = pd.DataFrame({'Rank': [1, 2], '12/8/2017': ['Name_1', 'Name_2'], '12/9/2017': ['Name_3', 'Name_1']})
df2.set_index('Rank', inplace = True)

# reshape to get dates into rows
df2_reshaped = pd.melt(df2, id_vars = ['Rank'], 
                       value_vars = df1.columns, 
                       var_name = 'Date', 
                       value_name = 'Name').drop('Rank', axis = 1)
#print(df2_reshaped)

# create the third df
df3 = pd.DataFrame({'Rank': [1, 2], '12/8/2017': ['10', '20'], '12/9/2017': ['30', '10']})
df3.set_index('Rank', inplace = True)

# reshape to get the dates into rows
df3_reshaped = pd.melt(df3, id_vars = ['Rank'], 
                       value_vars = df1.columns, 
                       var_name = 'Date', 
                       value_name = 'Score').drop('Rank', axis = 1)
#print(df3_reshaped)

# merge the three together. 
# This _assumes_ that the dfs are all in the same order and that all the data matches up.
merged_df = pd.DataFrame([df1_reshaped['Date'], df1_reshaped['Code'], df2_reshaped['Name'], df3_reshaped['Score']]).T
print(merged_df)

# group by code, name, and date; sum the scores together if multiple exist for a given code-name-date grouping
grouped_df = merged_df.groupby(['Code', 'Name', 'Date']).sum().sort_values('Score', ascending = False)
print(grouped_df)

summed_df = merged_df.drop('Date', axis = 1) \
    .groupby(['Code', 'Name']).sum() \
    .sort_values('Score', ascending = False).reset_index()
summed_df['li'] = list(zip(summed_df.Name, summed_df.Score))
print(summed_df)

未排序的词典:

d = dict(zip(summed_df.Code, summed_df.li))
print(d)

当然,您可以直接进行订购,并且应该:

from collections import OrderedDict
d2 = OrderedDict(zip(summed_df.Code, summed_df.li))
print(d2)

summed_df

  Code    Name  Score            li
0    3  Name_3     30  (Name_3, 30)
1    1  Name_1     20  (Name_1, 20)
2    2  Name_2     20  (Name_2, 20)

d

{'3': ('Name_3', 30), '1': ('Name_1', 20), '2': ('Name_2', 20)}

d2,排序:

OrderedDict([('3', ('Name_3', 30)), ('1', ('Name_1', 20)), ('2', ('Name_2', 20))])

它以元组的形式返回你的(名字,分数),不是列表,而是。。。应该有更多的路要走。你知道吗

相关问题 更多 >