我有以下代码
import pandas as pd
import numpy as np
import csv
location = r'C:\Users\tmaina\Desktop\scf\output.csv'
df = pd.read_csv(location,sep='\s*,\s*',engine='python')
for i, row in df.iterrows():
if row['COUPON_NUMBER'] == 1:
df.OND_ORIGIN = df.DEP_FROM
if df.loc[i+1,'PLDATE'] == row['PLDATE'] & row['TICKET_NUMBER'] ==df.loc[i+1,'TICKET_NUMBER'] &row['COUPON_NUMBER'] == 2:
df.OND_DEST = df.loc[i+1,'ARR_TO']
else:
df.OND_DEST = df.ARR_TO
elif row['COUPON_NUMBER'] == 2 & row['TICKET_NUMBER'] ==df.loc[i-1,'TICKET_NUMBER'] & row['PLDATE'] ==df.loc[i-1,'PLDATE']:
df.OND_ORIGIN==df.loc[i-1,'DEP_FROM']
df.OND_DEST = df.ARR_TO
elif row['COUPON_NUMBER'] == 3 & row['TICKET_NUMBER'] ==df.loc[i-1,'TICKET_NUMBER'] & row['PLDATE'] !=df.loc[i-1,'PLDATE']:
df.OND_ORIGIN = df.DEP_FROM
if df.loc[i+1,'PLDATE'] == row['PLDATE'] & row['TICKET_NUMBER'] ==df.loc[i-1,'TICKET_NUMBER']:
df.OND_DEST = df.loc[i+1,'ARR_TO']
else:
df.OND_DEST = df.ARR_TO
elif row['COUPON_NUMBER'] == 4 & row['TICKET_NUMBER'] ==df.loc[i-1,'TICKET_NUMBER']& row['PLDATE'] ==df.loc[i-1,'PLDATE']:
df.OND_ORIGIN = df.loc[i-1,'DEP_FROM']
df.OND_DEST = df.ARR_TO
df.to_csv('out.csv', sep=',',index = False)
以下列的输出为
COUPON_NUMBER TICKET_NUMBER DEP_FROM ARR_TO OND_ORIGIN OND_DEST PLDATE STOPOVER
1 1054737998 HRE NBO HRE NBO 20170419 O
2 1054737998 NBO KGL NBO KGL 20170419 X
3 1054737998 KGL NBO KGL NBO 20170519 O
4 1054737998 NBO HRE NBO HRE 20170419 X
所需输出为
COUPON_NUMBER TICKET_NUMBER DEP_FROM ARR_TO OND_ORIGIN OND_DEST PLDATE STOPOVER
1 1054737998 HRE NBO HRE KGL 20170419 O
2 1054737998 NBO KGL HRE KGL 20170419 X
3 1054737998 KGL NBO KGL HRE 20170519 O
4 1054737998 NBO HRE KGL HRE 20170419 X
逻辑是,对于属于特定票的给定coupon_number
,我们检查pldate
,如果同一个月有多张优惠券,那么ond_origin
和ond_dest
应该相等。ond_dest
是通过检查特定城市是否有中途停留来确定的。如果有的话,arr_to
变成了ond_dest
,ond_origin
变成了第一个dep_from
没有中途停留的地方。你知道吗
您可以使用} 和^{} 来实现这一点,而不是遍历每一行。要获得每个组的第一个和最后一个,可以使用this
groupby
、^{如果
PLDATE
是datetime列,您可以这样做Grouper
仅当您希望每月分组时才需要。如果是每个日期,您只需执行df.groupby(['TICKET_NUMBER', 'PLDATE', freq='1M'])
相关问题 更多 >
编程相关推荐