ID ArCityArCountry DptCityDptCountry DateDpt DateAr
1922 ParisFrance NewYorkUnitedState 2008-03-10 2001-02-02
1002 LosAngelesUnitedState California UnitedState 2008-03-10 2008-12-01
1901 ParisFrance LagosNigeria 2001-03-05 2001-02-02
1922 ParisFrance NewYorkUnitedState 2011-02-03 2008-12-01
1002 ParisFrance CaliforniaUnitedState 2003-03-04 2002-03-04
1099 ParisFrance BeijingChina 2011-02-03 2009-02-04
1901 LosAngelesUnitedState ParisFrance 2001-03-05 2001-02-02
我想将它们分组,即ParisFrance
,LosAngelesUnitedState
,然后DPTCITYDPTCOUNTRY
(相同),然后考虑日期(即DateAr
和DateDpt
)。你知道吗
例如
ParisFrance
[它应该列出ID
、DateDpt
、DateAr
与ParisFrance
有关的所有内容,而不必重复写入ParisFrance
,但可以列出与之有关的内容]
LosAngelesUnitedState
[它应该列出ID
、DateDpt
、DateAr
所有与LosAngelesUnitedState
相关的内容,而不必重复写入LosAngelesUnitedState
,但可以列出与之相关的内容]
import pandas as pd
import datetime
from pandas_datareader import data, wb
import csv
import numpy as np
out= open("testfile.csv", "rb")
data = csv.reader(out)
#df = pd.read_csv('testfile.csv')
data = [[row[0],row[1] + row[2],row[3] + row[4], row[5],row[6]] for row in data]
out.close()
print data
out=open("data.csv", "wb")
output = csv.writer(out)
for row in data:
output.writerow(row)
out.close()
df = pd.read_csv('data.csv')
for DateDpt, DateAr in df.iteritems():
df.DateDpt = pd.to_datetime(df.DateDpt, format='%Y-%m-%d')
df.DateAr = pd.to_datetime(df.DateAr, format='%Y-%m-%d')
print df
df[(df.DateAr <= df.DateDpt)]
.sort(['ID','DateAr','DateDpt'],
ascending[1,1,1,0])
.groupby(['DptCityDptCountry','ArCityArCountry'])
.first().reset_index()
期望输出:
ParisFrance
[1922, NewYorkUnitedState, 2008-03-10, 2001-02-02], [1901,LagosNigeria, 2001-03-05 2001-02-02], [1922,NewYorkUnitedState,2011-02-03, 2008-12-01]
LosAngelesUnitedState
[1901,ParisFrance,2001-03-05, 2001-02-02]
听起来你在找这样的东西:
这使您接近所指示的格式-当然可以进一步调整
print()
。你知道吗相关问题 更多 >
编程相关推荐