转换数据帧以跟踪更改

2024-04-23 21:58:23 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一些学生的资料和他们选的科目。你知道吗

id     name   date from  date to    Subjectname  note
1188    Cera  01-08-2016 30-09-2016 math         approved
1188    Cera  01-10-2016            elec    
1199    ron   01-06-2017            english      app-true
1288    Snow  01-01-2017            tally   
1433    sansa 25-01-2016 14-07-2016 tally   
1433    sansa 15-07-2016 16-01-2017 tally        relected
1844    amy   01-10-2016 10-11-2017 adv 
1522    stark 01-01-2016            phy 
1722    sid   01-06-2017 31-03-2018 history 
1722    sid   01-04-2018            history      as per request
1844    amy   01-01-2016 30-09-2016 science 
2100    arya  01-08-2016 30-09-2016 english 
2100    arya  01-10-2016 31-05-2017 math         taken
2100    arya  01-06-2017            english 

我要找的是:

id      name    from        to          subject from subject to
1188    Cera    01-08-2016  01-10-2016  math         elec
1199    ron     01-06-2017              english 
1288    Snow    01-01-2017              tally   
1433    sansa   25-01-2016  16-01-2017  tally        tally
1522    stark   01-01-2016              phy 
1722    sid     01-06-2017  01-04-2018  history      history
1844    amy     01-01-2016  10-11-2017  science      adv
2100    arya    01-08-2016  31-05-2017  english      math
2100    arya    01-06-2017              math         english

“from”列具有与名称对应的最小日期值。 列“to”具有与名称对应的最大日期值。 列“subject from”具有与列“from”和“name”对应的“Subjectname”值。 列“subject to”具有与列“to”和“name”对应的“Subjectname”值。你知道吗

我需要跟踪学生所做的交易和他们更改的主题名称(主题从和主题到)。 请让我知道如何做到这一点。你知道吗

或者,请让我知道,如果有一个简单的方法来获得一个输出,其中包含每个学生的交易细节和他们改变的主题。你知道吗


Tags: tonamefrom主题englishmathhistory学生
2条回答

我的df从您的前3行,它应该可以演示如何做到这一点。 测向:

     id  name  date_from     date_to subject_name      note
0  1188  Cera 2016-01-08  30-09-2016         math  approved
1  1188  Cera 2016-01-10                     elec
2  1199   ron 2017-01-06                  english  app-true

只需在这里粘贴代码。你知道吗

# make date from and date to to one column to get max and min date
df1 = df[['id', 'name', 'date_from', 'subject_name', 'note']]
df2 = df[['id', 'name', 'date_to', 'subject_name', 'note']]
df3 = pd.concat([df1,df2])

df1.columns = ['id', 'name', 'date', 'subject_name', 'note']
df2.columns = ['id', 'name', 'date', 'subject_name', 'note']
df3 = pd.concat([df1,df2])
df3['date'] = pd.to_datetime(df3['date'])
df3 = df3.dropna()
df3:
     id  name       date subject_name      note
0  1188  Cera 2016-01-08         math  approved
1  1188  Cera 2016-01-10         elec
2  1199   ron 2017-01-06      english  app-true
0  1188  Cera 2016-09-30         math  approved
#here you get from and to date for each name
df4 = df3.groupby('name').agg({'date':[max,min]})
df4.columns = ['to','from']
df4 = df4.reset_index()
df4:
   name         to       from
0  Cera 2016-09-30 2016-01-08
1   ron 2017-01-06 2017-01-06
# match "name" and "to" in df4 with "name" and "date" in df3, you got the earliest subject and latest 
df_sub_from = pd.merge(df4,df3,how='left',left_on=['name','to'],right_on=['name','date'])
df_sub_from
df_sub_to = pd.merge(df4,df3,how='left',left_on=['name','to'],right_on=['name','date'])
df_sub_from = pd.merge(df4,df3,how='left',left_on=['name','from'],right_on=['name','date'])
#remove unneed columns
df_sub_from = df_sub_from[['id','name','from','to','subject_name']]
df_sub_to = df_sub_to[['id','name','from','to','subject_name']]
# merge together and rename nicely
df_final = pd.merge(df_sub_from,df_sub_to,left_on=['id','name','from','to'],right_on=['id','name','from','to'])
df_final.columns = ['id','name','from','to','subject_from','subject_to']

在这里:

     id  name       from         to subject_from subject_to
0  1188  Cera 2016-01-08 2016-09-30         math       math
1  1199   ron 2017-01-06 2017-01-06      english    english

按列Subjectname使用^{}^{},因此可以使用^{}^{}对于主题,按每组的最小和最大日期时间:

df['date from'] = pd.to_datetime(df['date from'])
df['date to'] = pd.to_datetime(df['date to'])

d = {'date from':['min', 'idxmin'], 'date to':['max', 'idxmax']}
df1 = df.set_index('Subjectname').groupby(['id','name']).agg(d)
df1.columns = df1.columns.map('_'.join)
d1 = {'date from_min':'from','date to_max':'to',
      'date from_idxmin':'subject from','date to_idxmax':'subject to'}

cols = ['from','to','subject from','subject to']
df1 = df1.rename(columns=d1).reindex(columns=cols).reset_index()
print (df1)
     id   name       from         to subject from subject to
0  1188   Cera 2016-01-08 2016-09-30         math       math
1  1199    ron 2017-01-06        NaT      english        NaN
2  1288   Snow 2017-01-01        NaT        tally        NaN
3  1433  sansa 2016-01-25 2017-01-16        tally      tally
4  1522  stark 2016-01-01        NaT          phy        NaN
5  1722    sid 2017-01-06 2018-03-31      history    history
6  1844    amy 2016-01-01 2017-10-11      science        adv
7  2100   arya 2016-01-08 2017-05-31      english       math

相关问题 更多 >