从开始日期到结束日期

2024-03-28 17:32:20 发布

您现在位置:Python中文网/ 问答频道 /正文

ID    ArCityArCountry         DptCityDptCountry      DateDpt    DateAr
1922  ParisFrance             NewYorkUnitedState     2008-03-10 2001-02-02
1002  LosAngelesUnitedState   California UnitedState 2008-03-10 2008-12-01
1901  ParisFrance             LagosNigeria           2001-03-05 2001-02-02
1922  ParisFrance             NewYorkUnitedState     2011-02-03 2008-12-01
1002  ParisFrance             CaliforniaUnitedState  2003-03-04 2002-03-04
1099  ParisFrance             BeijingChina           2011-02-03 2009-02-04
1901  LosAngelesUnitedState   ParisFrance            2001-03-05 2001-02-02

我想将它们分组,即ParisFranceLosAngelesUnitedState,然后DPTCITYDPTCOUNTRY(相同),然后考虑日期(即DateArDateDpt)。你知道吗

例如 ParisFrance[它应该列出IDDateDptDateArParisFrance有关的所有内容,而不必重复写入ParisFrance,但可以列出与之有关的内容] LosAngelesUnitedState[它应该列出IDDateDptDateAr所有与LosAngelesUnitedState相关的内容,而不必重复写入LosAngelesUnitedState,但可以列出与之相关的内容]

import pandas as pd
import datetime
from pandas_datareader import data, wb
import csv
import numpy as np

out= open("testfile.csv", "rb")
data = csv.reader(out)
#df = pd.read_csv('testfile.csv')
data = [[row[0],row[1] + row[2],row[3] + row[4], row[5],row[6]] for row in data]
out.close()
print data
out=open("data.csv", "wb")
output = csv.writer(out)
    for row in data:
    output.writerow(row)

out.close()

df = pd.read_csv('data.csv')
for DateDpt, DateAr in df.iteritems():
    df.DateDpt = pd.to_datetime(df.DateDpt, format='%Y-%m-%d')
    df.DateAr = pd.to_datetime(df.DateAr, format='%Y-%m-%d')
print df

df[(df.DateAr <= df.DateDpt)]
    .sort(['ID','DateAr','DateDpt'],
        ascending[1,1,1,0])
    .groupby(['DptCityDptCountry','ArCityArCountry'])
   .first().reset_index()

期望输出:

ParisFrance 
  [1922, NewYorkUnitedState, 2008-03-10, 2001-02-02], [1901,LagosNigeria, 2001-03-05 2001-02-02], [1922,NewYorkUnitedState,2011-02-03, 2008-12-01]

LosAngelesUnitedState
  [1901,ParisFrance,2001-03-05, 2001-02-02]

Tags: csvimportid内容dfdatadatetimeout
1条回答
网友
1楼 · 发布于 2024-03-28 17:32:20

听起来你在找这样的东西:

df['DateAr'] = pd.to_datetime(df['DateAr'])
df['DateDpt'] = pd.to_datetime(df['DateDpt'])

dept_cities = df.groupby('ArCityArCountry')

for city, departures in dept_cities:
    print(city)
    print([list(r) for r in departures.loc[:, ['ID', 'DptCityDptCountry', 'DateDpt', 'DateAr']].to_records()])

这使您接近所指示的格式-当然可以进一步调整print()。你知道吗

LosAngelesUnitedState
[[1, 1002, 'California UnitedState', numpy.datetime64('2008-03-09T18:00:00.000000000-0600'), numpy.datetime64('2008-11-30T18:00:00.000000000-0600')], [6, 1901, 'ParisFrance', numpy.datetime64('2001-03-04T18:00:00.000000000-0600'), numpy.datetime64('2001-02-01T18:00:00.000000000-0600')]]
ParisFrance
[[0, 1922, 'NewYorkUnitedState', numpy.datetime64('2008-03-09T18:00:00.000000000-0600'), numpy.datetime64('2001-02-01T18:00:00.000000000-0600')], [2, 1901, 'LagosNigeria', numpy.datetime64('2001-03-04T18:00:00.000000000-0600'), numpy.datetime64('2001-02-01T18:00:00.000000000-0600')], [3, 1922, 'NewYorkUnitedState', numpy.datetime64('2011-02-02T18:00:00.000000000-0600'), numpy.datetime64('2008-11-30T18:00:00.000000000-0600')], [4, 1002, 'CaliforniaUnitedState', numpy.datetime64('2003-03-03T18:00:00.000000000-0600'), numpy.datetime64('2002-03-03T18:00:00.000000000-0600')], [5, 1099, 'BeijingChina', numpy.datetime64('2011-02-02T18:00:00.000000000-0600'), numpy.datetime64('2009-02-03T18:00:00.000000000-0600')]]

相关问题 更多 >