如何从多个数据帧填充新列?

2024-04-24 16:28:23 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个熊猫数据框架,有两列:“IMO”和“LOAD_DATE”。 许多IMO有多个加载日期

我想创建另一个数据框,将所有日期作为索引,并为每个IMO创建新的列。每列都用“0”表示空日,用“1”表示加载日

输入文件:

    | VESSEL_IMO |    Date 
  1 |    9821    |   16-12-16
  2 |    9821    |   20-12-16
  3 |    9822    |   16-12-16
  4 |    9822    |   17-12-16
  5 |    9823    |   16-12-16
  6 |    9823    |   18-12-16
  7 |    9999    |   15-12-16
  8 |    9999    |   18-12-16
  9 |    9999    |   21-12-16

以下是迄今为止返回给我的代码示例:

索引器错误:索引超出范围

df = pd.DataFrame({'Date' : calendrier})

for namm in xl['AS_VESSEL_IMO'].unique():
    df[namm] = 0    
    al_datt = xl[xl['AS_VESSEL_IMO'] == namm]['AS_LOAD_DATE']
    df.ix[df['Date'].isin(al_datt), df[namm]] = 1

期望输出:

    Date   | 9821 | 9822 | 9823 |...| 9999 
  15-12-16 |   0  |   0  |   0  |...|   1 
  16-12-16 |   1  |   1  |   1  |...|   0 
  17-12-16 |   0  |   1  |   0  |...|   0 
  18-12-16 |   0  |   0  |   1  |...|   1 
  19-12-16 |   0  |   0  |   0  |...|   0 
  20-12-16 |   1  |   0  |   0  |...|   0 
  21-12-16 |   0  |   0  |   0  |...|   1 

Tags: 文件数据框架dfdateasloadal
1条回答
网友
1楼 · 发布于 2024-04-24 16:28:23

样本:

df1 = pd.DataFrame({'Date' : pd.date_range('16-12-2016', periods=10)})
print (df1)
        Date
0 2016-12-16
1 2016-12-17
2 2016-12-18
3 2016-12-19
4 2016-12-20
5 2016-12-21
6 2016-12-22
7 2016-12-23
8 2016-12-24
9 2016-12-25

我认为您需要^{},如果使用聚合max重复groupby

df['a'] = 1
df.Date = pd.to_datetime(df.Date)
df = df.set_index(['Date', 'VESSEL_IMO'])['a'].unstack(fill_value=0)

#if duplicates in rows and get ValueError: Index contains duplicate entries, cannot reshape
#df = df.groupby(['Date', 'VESSEL_IMO'])['a'].max().unstack(fill_value=0)
print (df)
VESSEL_IMO  9821  9822  9823  9999
Date                              
2016-12-15     0     0     0     1
2016-12-16     1     1     1     0
2016-12-17     0     1     0     0
2016-12-18     0     0     1     1
2016-12-20     1     0     0     0
2016-12-21     0     0     0     1

最后^{}

df = df.reindex(df1.Date, fill_value=0)
print (df)
VESSEL_IMO  9821  9822  9823  9999
Date                              
2016-12-16     1     1     1     0
2016-12-17     0     1     0     0
2016-12-18     0     0     1     1
2016-12-19     0     0     0     0
2016-12-20     1     0     0     0
2016-12-21     0     0     0     1
2016-12-22     0     0     0     0
2016-12-23     0     0     0     0
2016-12-24     0     0     0     0
2016-12-25     0     0     0     0

使用^{}^{}的另一种解决方案:

df['a'] = 1
df.Date = pd.to_datetime(df.Date)
df = df.pivot(index ='Date', columns='VESSEL_IMO', values='a').fillna(0)
#if duplicated index
#df = df.pivot_table(index='Date',columns='VESSEL_IMO',values='a',fill_value=0,aggfunc='max')
print (df)
VESSEL_IMO  9821  9822  9823  9999
Date                              
2016-12-15   0.0   0.0   0.0   1.0
2016-12-16   1.0   1.0   1.0   0.0
2016-12-17   0.0   1.0   0.0   0.0
2016-12-18   0.0   0.0   1.0   1.0
2016-12-20   1.0   0.0   0.0   0.0
2016-12-21   0.0   0.0   0.0   1.0

df = df.reindex(df1.Date, fill_value=0).astype(int)

VESSEL_IMO  9821  9822  9823  9999
Date                              
2016-12-16     1     1     1     0
2016-12-17     0     1     0     0
2016-12-18     0     0     1     1
2016-12-19     0     0     0     0
2016-12-20     1     0     0     0
2016-12-21     0     0     0     1
2016-12-22     0     0     0     0
2016-12-23     0     0     0     0
2016-12-24     0     0     0     0
2016-12-25     0     0     0     0

相关问题 更多 >