使用Pandas制作一份attendace学生报告

3条回答

网友

1楼 · 编辑于 2024-06-01 04:47:00

你可以用它来做

df1= pd.read_csv("12-9-2020.csv")
df1

Name            StudentID
Robert C        102
Jessica Myla    103
Nana D          105

df2= pd.read_csv("12-10-2020.csv")
# get ids from second table
ids_of_presenties = df2['StudentID'].values.tolist()

# in first dataframe create a date column
current_date = '12/9/2020'
df1[current_date] = 0
df1[df1['StudentID'].isin(ids_of_presenties)][current_date] = 1 # mark the attendance of the only student present.

没有数据帧。希望是好的

网友

2楼 · 编辑于 2024-06-01 04:47:00

以下是两个数据帧的解决方案：

df1.set_index('StudentID', inplace=True)
df1.loc[:, '12-9-2020.csv'] = 1

df2.set_index('StudentID', inplace=True)
df2.loc[:, '12-10-2020.csv'] = 1
df1 = df1.join(df2, how='outer', rsuffix='_')
df1['Name'] = df1['Name'].combine_first(df1['Name_'])
df1.drop('Name_', axis=1, inplace=True)

df1.fillna(0).reset_index()

对于更多数据帧，根据需要重复第3-7行

网友

3楼 · 编辑于 2024-06-01 04:47:00

以下是完整的解决方案：

df-从桌面上的特定文件夹将所有文件动态读入一个数据帧（并使用assign创建Date列，并从文件名派生日期）。您必须在files变量中指定此项
df_attendance-使用.groupby，通过StudentID返回1或0，并将Date放在列上
df_names-通过选择每个ID的最长名称作为Name来规范化Name数据
df_attendance-join两个数据帧df_attendance和df_names

import glob
files = glob.glob(r'Desktop\Students\*.csv')
df = pd.concat([pd.read_csv(f).assign(Date=pd.to_datetime(os.path.basename(f)[:-4]))
                for f in files])
df_attendance = (df.groupby(['StudentID', df['Date'].dt.date]).size()
                   .unstack(1).fillna(0).astype(int))
df_names = (df.groupby(['StudentID'])['Name'].apply(list)
              .apply(lambda x: [y for y in x if len(y) == 
                                max([len(z) for z in x])][0])).to_frame() # this reads.. per student choose the name that is the longest name
df_Result = df_names.join(df_attendance).reset_index()
df_Result
Out[1]: 
   StudentID          Name  2020-12-09  2020-12-10
0        102      Robert C           1           0
1        103  Jessica Myla           1           1
2        104   Harris Kurt           0           1
3        105   Nana Duncan           1           1

如果要更改“日期”列中日期的格式，则可以运行以下非常类似的代码：

files = glob.glob(r'Desktop\Students\*.csv')
df = pd.concat([pd.read_csv(f).assign(Date=pd.to_datetime(os.path.basename(f)[:-4]))
                for f in files]).sort_values(['StudentID', 'Date'])
df_attendance = (df.groupby(['StudentID', df['Date'].dt.strftime('%m/%d/%y')], sort=False).size()
                   .unstack(1).fillna(0).astype(int))
df_names = (df.groupby(['StudentID'])['Name'].apply(list)
              .apply(lambda x: [y for y in x if len(y) == 
                                max([len(z) for z in x])][0])).to_frame()
df_Result = df_names.join(df_attendance).reset_index()
df_Result

Out[2]: 
   StudentID          Name  12/09/20  12/10/20
0        102      Robert C         1         0
1        103  Jessica Myla         1         1
2        104   Harris Kurt         0         1
3        105   Nana Duncan         1         1

根据你的评论，你可以这样做请注意，对于直接匹配，最好使用StudentID而不是Name，因此如果使用该，请将“StudentID”替换为“Name”：

df_list = pd.read_csv(file) # pass the filepath of you file and must have column "Name:
df_list = df_list[~(df_list['Name'].isin(df_Result['Name']))] #find students not in the file
df_Result = pd.concat([df_Result, df_list[['Name']]]).fillna(0)
df_Result

相关问题更多 >

编程相关推荐

热门问题

热门文章

使用Pandas制作一份attendace学生报告

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >