如何迭代datafram中的行

2024-04-28 21:31:02 发布

您现在位置:Python中文网/ 问答频道 /正文

我有以下代码

import pandas as pd
import numpy as np
import csv


location = r'C:\Users\tmaina\Desktop\scf\output.csv'
df = pd.read_csv(location,sep='\s*,\s*',engine='python')
for i, row in df.iterrows():
    if row['COUPON_NUMBER'] == 1:
        df.OND_ORIGIN = df.DEP_FROM 
        if  df.loc[i+1,'PLDATE'] == row['PLDATE'] & row['TICKET_NUMBER'] ==df.loc[i+1,'TICKET_NUMBER'] &row['COUPON_NUMBER'] == 2:
            df.OND_DEST = df.loc[i+1,'ARR_TO']
        else:
            df.OND_DEST = df.ARR_TO
    elif row['COUPON_NUMBER'] == 2 & row['TICKET_NUMBER'] ==df.loc[i-1,'TICKET_NUMBER'] & row['PLDATE'] ==df.loc[i-1,'PLDATE']:
        df.OND_ORIGIN==df.loc[i-1,'DEP_FROM']
        df.OND_DEST = df.ARR_TO
    elif row['COUPON_NUMBER'] == 3 & row['TICKET_NUMBER'] ==df.loc[i-1,'TICKET_NUMBER'] & row['PLDATE'] !=df.loc[i-1,'PLDATE']:
        df.OND_ORIGIN = df.DEP_FROM
        if  df.loc[i+1,'PLDATE'] == row['PLDATE'] & row['TICKET_NUMBER'] ==df.loc[i-1,'TICKET_NUMBER']:
            df.OND_DEST = df.loc[i+1,'ARR_TO']
        else:
            df.OND_DEST = df.ARR_TO
    elif row['COUPON_NUMBER'] == 4 & row['TICKET_NUMBER'] ==df.loc[i-1,'TICKET_NUMBER']& row['PLDATE'] ==df.loc[i-1,'PLDATE']:
        df.OND_ORIGIN = df.loc[i-1,'DEP_FROM']
        df.OND_DEST = df.ARR_TO

df.to_csv('out.csv', sep=',',index = False)

以下列的输出为

COUPON_NUMBER TICKET_NUMBER DEP_FROM    ARR_TO  OND_ORIGIN  OND_DEST  PLDATE   STOPOVER
    1          1054737998    HRE             NBO    HRE     NBO       20170419  O
    2          1054737998    NBO             KGL    NBO     KGL       20170419  X   
    3          1054737998    KGL             NBO    KGL     NBO       20170519  O   
    4          1054737998    NBO             HRE    NBO     HRE       20170419  X

所需输出为

COUPON_NUMBER TICKET_NUMBER DEP_FROM    ARR_TO  OND_ORIGIN  OND_DEST  PLDATE   STOPOVER
    1          1054737998    HRE         NBO    HRE         KGL       20170419  O
    2          1054737998    NBO         KGL    HRE         KGL       20170419  X   
    3          1054737998    KGL         NBO    KGL         HRE       20170519  O   
    4          1054737998    NBO         HRE    KGL         HRE       20170419  X

逻辑是,对于属于特定票的给定coupon_number,我们检查pldate,如果同一个月有多张优惠券,那么ond_originond_dest应该相等。ond_dest是通过检查特定城市是否有中途停留来确定的。如果有的话,arr_to变成了ond_destond_origin变成了第一个dep_from没有中途停留的地方。你知道吗


Tags: tonumberdforiginticketlocdestrow
1条回答
网友
1楼 · 发布于 2024-04-28 21:31:02

您可以使用groupby^{}^{}来实现这一点,而不是遍历每一行。要获得每个组的第一个和最后一个,可以使用this

如果PLDATE是datetime列,您可以这样做

df['OND_ORIGIN'] = df.groupby(['TICKET_NUMBER', pd.Grouper(key='PLDATE', freq='1M')])['DEP_FROM'].transform(first)   
df['OND_DEST'] = df.groupby(['TICKET_NUMBER', pd.Grouper(key='PLDATE', freq='1M')])['ARR_TO'].transform(last)

Grouper仅当您希望每月分组时才需要。如果是每个日期,您只需执行df.groupby(['TICKET_NUMBER', 'PLDATE', freq='1M'])

相关问题 更多 >