使用Pandas制作一份attendace学生报告

2024-06-01 04:47:00 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一些来自谷歌的考勤报告表格。数据如下所示

df1= pd.read_csv("12-9-2020.csv")
df1

Name            StudentID
Robert C        102
Jessica Myla    103
Nana D          105

df2= pd.read_csv("12-10-2020.csv")
df2

Name            StudentID
J Myla          103
Harris Kurt     104
Nana Duncan     105

我有很多表格,我想做一份出勤报告。基本的考勤报告如下所示:

df_Basic
Name             StudentID  12/9/2020   12/10/2020
Robert Case      102        0           0
Jessica Myla     103        0           0
Harris Kurt      104        0           0
Nana Duncan      105        0           0

我想将数据表df1, df2输入到编制考勤报告中。如果学生上课,则必须为1,学生姓名的拼写将与出勤报告格式相匹配

预期结果如下所示:

df_Result
Name          StudentID 12/9/2020   12/10/2020
Robert Case   102       1           0
Jessica Myla  103       1           1
Harris Kurt   104       0           1
Nana Duncan   105       1           1

谢谢你帮助我


Tags: csvname报告robert表格pddf1df2
3条回答

你可以用它来做

df1= pd.read_csv("12-9-2020.csv")
df1

Name            StudentID
Robert C        102
Jessica Myla    103
Nana D          105

df2= pd.read_csv("12-10-2020.csv")
# get ids from second table
ids_of_presenties = df2['StudentID'].values.tolist()

# in first dataframe create a date column
current_date = '12/9/2020'
df1[current_date] = 0
df1[df1['StudentID'].isin(ids_of_presenties)][current_date] = 1 # mark the attendance of the only student present.

没有数据帧。希望是好的

以下是两个数据帧的解决方案:

df1.set_index('StudentID', inplace=True)
df1.loc[:, '12-9-2020.csv'] = 1

df2.set_index('StudentID', inplace=True)
df2.loc[:, '12-10-2020.csv'] = 1
df1 = df1.join(df2, how='outer', rsuffix='_')
df1['Name'] = df1['Name'].combine_first(df1['Name_'])
df1.drop('Name_', axis=1, inplace=True)

df1.fillna(0).reset_index()

对于更多数据帧,根据需要重复第3-7行

以下是完整的解决方案:

  1. df-从桌面上的特定文件夹将所有文件动态读入一个数据帧(并使用assign创建Date列,并从文件名派生日期)。您必须在files变量中指定此项
  2. df_attendance-使用.groupby,通过StudentID返回10,并将Date放在列上
  3. df_names-通过选择每个ID的最长名称作为Name来规范化Name数据
  4. df_attendance-join两个数据帧df_attendancedf_names

import glob
files = glob.glob(r'Desktop\Students\*.csv')
df = pd.concat([pd.read_csv(f).assign(Date=pd.to_datetime(os.path.basename(f)[:-4]))
                for f in files])
df_attendance = (df.groupby(['StudentID', df['Date'].dt.date]).size()
                   .unstack(1).fillna(0).astype(int))
df_names = (df.groupby(['StudentID'])['Name'].apply(list)
              .apply(lambda x: [y for y in x if len(y) == 
                                max([len(z) for z in x])][0])).to_frame() # this reads.. per student choose the name that is the longest name
df_Result = df_names.join(df_attendance).reset_index()
df_Result
Out[1]: 
   StudentID          Name  2020-12-09  2020-12-10
0        102      Robert C           1           0
1        103  Jessica Myla           1           1
2        104   Harris Kurt           0           1
3        105   Nana Duncan           1           1


如果要更改“日期”列中日期的格式,则可以运行以下非常类似的代码:

files = glob.glob(r'Desktop\Students\*.csv')
df = pd.concat([pd.read_csv(f).assign(Date=pd.to_datetime(os.path.basename(f)[:-4]))
                for f in files]).sort_values(['StudentID', 'Date'])
df_attendance = (df.groupby(['StudentID', df['Date'].dt.strftime('%m/%d/%y')], sort=False).size()
                   .unstack(1).fillna(0).astype(int))
df_names = (df.groupby(['StudentID'])['Name'].apply(list)
              .apply(lambda x: [y for y in x if len(y) == 
                                max([len(z) for z in x])][0])).to_frame()
df_Result = df_names.join(df_attendance).reset_index()
df_Result

Out[2]: 
   StudentID          Name  12/09/20  12/10/20
0        102      Robert C         1         0
1        103  Jessica Myla         1         1
2        104   Harris Kurt         0         1
3        105   Nana Duncan         1         1

根据你的评论,你可以这样做请注意,对于直接匹配,最好使用StudentID而不是Name,因此如果使用该,请将“StudentID”替换为“Name”:

df_list = pd.read_csv(file) # pass the filepath of you file and must have column "Name:
df_list = df_list[~(df_list['Name'].isin(df_Result['Name']))] #find students not in the file
df_Result = pd.concat([df_Result, df_list[['Name']]]).fillna(0)
df_Result

相关问题 更多 >