PythonPandas：如何通过一列中有多少相似的值来创建排序数据帧？

2条回答

网友

1楼 · 编辑于 2024-06-09 02:00:29

你的第一步应该是按照主题对学生进行分组：

>>> agg_df = data.groupby("subjects", as_index=False)["name"].agg(lambda x: x.tolist())
>>> print(agg_df)
    subjects                     name
0    biology                [michael]
1  chemistry  [vincent, allen, sarah]
2      maths                [rebecca]
3    physics            [todd, jamie]
4      stats                [georgia]

然后您可以在行上循环生成句子并将它们附加到列表中：

ls1 = []

for i, row in agg_df.iterrows():
    str1 = " and ".join(row['name']) + (" are" if len(row['name']) > 1 else " is") + " taking {0}".format(row['subjects'])
    ls1.append(str1)

print(". ".join(ls1))
# 'michael are taking biology. vincent and allen and sarah are taking chemistry. rebecca are taking maths. todd and jamie are taking physics. georgia are taking stats'

网友

2楼 · 编辑于 2024-06-09 02:00:29

用途：

import pandas as pd
data = pd.read_csv("https://raw.githubusercontent.com/kasun-maldeni/intro-to-python/master/data.csv")

print(data)
      name     location   subjects
0     todd    melbourne    physics
1    jamie      toronto    physics
2  rebecca  Los Angeles      maths
3  michael       Sydney    biology
4  vincent      toronto  chemistry
5  georgia    Melbourne      stats
6    allen      toronto  chemistry
7    sarah     auckland  chemistry

可以使用^{}first作为第一个字母，大写字母为^{}。你知道吗

data = data.assign(name= data['name'].str.title(), subjects= data['subjects'].str.title())

然后用join聚合，如果没有必要区分复数和单数用法^{}：

s = data.groupby('subjects')['name'].agg(" and ".join)
out1 = s.str.cat(s.index, sep=' are taking ').tolist()
print (out1)
['Michael are taking Biology', 'Vincent and Allen and Sarah are taking Chemistry',
 'Rebecca are taking Maths', 'Todd and Jamie are taking Physics', 'Georgia are taking Stats

另一种解决方案是通过^{}和^{}聚合计数：

data = data.assign(name= data['name'].str.title(), subjects= data['subjects'].str.title())
df1 = data.groupby('subjects')['name'].agg([" and ".join, 'size'])
print (df1)
                                  join  size
subjects                                    
Biology                        Michael     1
Chemistry  Vincent and Allen and Sarah     3
Maths                          Rebecca     1
Physics                 Todd and Jamie     2
Stats                          Georgia     1

因此，可以使用compare by condition by ^{}创建分隔符数组，并与+连接在一起：

sep = np.where(df1['size'] == 1, ' is taking ', ' are taking ')
out2 = (df1['join'] + sep + df1.index).tolist()
print (out2)
['Michael is taking Biology', 'Vincent and Allen and Sarah are taking Chemistry', 
 'Rebecca is taking Maths', 'Todd and Jamie are taking Physics', 'Georgia is taking Stats']

相关问题更多 >

编程相关推荐

热门问题

热门文章

PythonPandas：如何通过一列中有多少相似的值来创建排序数据帧？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >