PythonPandas:如何通过一列中有多少相似的值来创建排序数据帧?

2024-06-09 02:00:29 发布

您现在位置:Python中文网/ 问答频道 /正文

我刚刚开始学习编程,所以我在大会上在线介绍了Python研讨会。你知道吗

我们得到了这个task来完成。我做了1,2,3,4。你知道吗

我对5有困难。老实说,我都不知道该找什么。你知道吗

我试过使用熊猫文档网站GeeksforGeeks,和 Data To Fish。然而,我不认为这些将有助于我真正需要做的事情。你知道吗

这是我的solution。你知道吗

这是我在Reddit中的问题的链接。你知道吗

谢谢。你知道吗


Tags: to文档taskdata网站链接编程事情
2条回答

你的第一步应该是按照主题对学生进行分组:

>>> agg_df = data.groupby("subjects", as_index=False)["name"].agg(lambda x: x.tolist())
>>> print(agg_df)
    subjects                     name
0    biology                [michael]
1  chemistry  [vincent, allen, sarah]
2      maths                [rebecca]
3    physics            [todd, jamie]
4      stats                [georgia]

然后您可以在行上循环生成句子并将它们附加到列表中:

ls1 = []

for i, row in agg_df.iterrows():
    str1 = " and ".join(row['name']) + (" are" if len(row['name']) > 1 else " is") + " taking {0}".format(row['subjects'])
    ls1.append(str1)

print(". ".join(ls1))
# 'michael are taking biology. vincent and allen and sarah are taking chemistry. rebecca are taking maths. todd and jamie are taking physics. georgia are taking stats'

用途:

import pandas as pd
data = pd.read_csv("https://raw.githubusercontent.com/kasun-maldeni/intro-to-python/master/data.csv")

print(data)
      name     location   subjects
0     todd    melbourne    physics
1    jamie      toronto    physics
2  rebecca  Los Angeles      maths
3  michael       Sydney    biology
4  vincent      toronto  chemistry
5  georgia    Melbourne      stats
6    allen      toronto  chemistry
7    sarah     auckland  chemistry

可以使用^{}first作为第一个字母,大写字母为^{}。你知道吗

data = data.assign(name= data['name'].str.title(), subjects= data['subjects'].str.title())

然后用join聚合,如果没有必要区分复数和单数用法^{}

s = data.groupby('subjects')['name'].agg(" and ".join)
out1 = s.str.cat(s.index, sep=' are taking ').tolist()
print (out1)
['Michael are taking Biology', 'Vincent and Allen and Sarah are taking Chemistry',
 'Rebecca are taking Maths', 'Todd and Jamie are taking Physics', 'Georgia are taking Stats

另一种解决方案是通过^{}^{}聚合计数:

data = data.assign(name= data['name'].str.title(), subjects= data['subjects'].str.title())
df1 = data.groupby('subjects')['name'].agg([" and ".join, 'size'])
print (df1)
                                  join  size
subjects                                    
Biology                        Michael     1
Chemistry  Vincent and Allen and Sarah     3
Maths                          Rebecca     1
Physics                 Todd and Jamie     2
Stats                          Georgia     1

因此,可以使用compare by condition by ^{}创建分隔符数组,并与+连接在一起:

sep = np.where(df1['size'] == 1, ' is taking ', ' are taking ')
out2 = (df1['join'] + sep + df1.index).tolist()
print (out2)
['Michael is taking Biology', 'Vincent and Allen and Sarah are taking Chemistry', 
 'Rebecca is taking Maths', 'Todd and Jamie are taking Physics', 'Georgia is taking Stats']

相关问题 更多 >