Python数据帧:基于月日映射两个数据帧?

2024-04-24 00:35:04 发布

您现在位置:Python中文网/ 问答频道 /正文

我有两个数据帧month_data数据帧从月初到月底有天

student_df每个学生仅有的present数据

我试图映射这两个数据帧,以便将每个学生剩余的天数标记为缺席

月份数据

        day_of_month
    0    2021-01-01
    1    2021-01-02
    2    2021-01-03
    3    2021-01-04
    4    2021-01-05
    5    2021-01-06
    6    2021-01-07
    .        .
    .        .
    .        .
    30    2021-01-31

month_data = pd.DataFrame({'day_of_month':pd.date_range('01/01/2021','31/01/2021')})

学生会

            student_data      attendence     day_of_month
    0       marcus              present       2021-01-01
    1       marcus              present       2021-01-03
    2       paul                present       2021-01-04

终稿

            student_data      attendence     day_of_month
    0       marcus              present       2021-01-01
    1       marcus              absent        2021-01-02
    2       marcus              present       2021-01-03
    3       marcus              absent        2021-01-04
    4       marcus              absent        2021-01-05
    .         .                   .               .
    .         .                   .               .
    30      marcus              absent        2021-01-31

    31      paul                absent        2021-01-01
    32      paul                absent        2021-01-02
    33      paul                absent        2021-01-03
    34      paul                present       2021-01-04
    .         .                   .               .
    .         .                   .               .
    61      paul                absent        2021-01-31

Tags: of数据dfdatastudent学生pdday
3条回答

您可以创建一个新的数据框,其中包含所有日期以及每个学生在每个日期的名称:

输入

#df1
  day_of_month
0   2021-01-01
1   2021-01-02
2   2021-01-03
3   2021-01-04
4   2021-01-05
5   2021-01-06
6   2021-01-07

#df2
  student_data attendence day_of_month
0       marcus    present   2021-01-01
1       marcus    present   2021-01-03
2         paul    present   2021-01-04

代码

students = df2['student_data'].unique().tolist()
df = pd.concat([df1.assign(student_data = name) for name in students])
df = df.merge(df2, on=['day_of_month', 'student_data'], how='left')
df['attendence'] = df['attendence'].fillna('absent')

输出

   day_of_month student_data attendence
0    2021-01-01       marcus    present
1    2021-01-02       marcus     absent
2    2021-01-03       marcus    present
3    2021-01-04       marcus     absent
4    2021-01-05       marcus     absent
5    2021-01-06       marcus     absent
6    2021-01-07       marcus     absent
7    2021-01-01         paul     absent
8    2021-01-02         paul     absent
9    2021-01-03         paul     absent
10   2021-01-04         paul    present
11   2021-01-05         paul     absent
12   2021-01-06         paul     absent
13   2021-01-07         paul     absent

我建议如下:

import pandas as pd

# create the three dataframes
month_data = pd.DataFrame({'day_of_month':pd.date_range('01/01/2021','31/01/2021')})
student_df = pd.DataFrame({'day_of_month':pd.date_range('01/15/2021','31/01/2021')})
final_df = month_data

# apply if statement to the dataframe
# the value 'present' will be assigned to all the rows where month_data has a date that is also present in the student_df.
final_df.loc[month_data['day_of_month'].isin(student_df['day_of_month']), 'attendence'] = 'present' 

# previous line will place a NaN in all other locations. The next line will replace all the NaNs by `absent`
final_df.loc[final_df['attendence'] != 'present', 'attendence'] = 'absent' 

带有MultiIndex的选项:

mix = pd.MultiIndex.from_product([
    df['student_data'].unique(),
    pd.date_range('2021-01-01', '2021-01-31')],
    names=['student_data', 'day_of_month'])

(df
     .set_index(['student_data', 'day_of_month'])
     .reindex(mix)
     .fillna('absent')
     .reset_index())

输出:

   student_data day_of_month attendence
0        marcus   2021-01-01    present
1        marcus   2021-01-02     absent
2        marcus   2021-01-03    present
3        marcus   2021-01-04     absent
4        marcus   2021-01-05     absent
..          ...          ...        ...
57         paul   2021-01-27     absent
58         paul   2021-01-28     absent
59         paul   2021-01-29     absent
60         paul   2021-01-30     absent
61         paul   2021-01-31     absent

相关问题 更多 >