在pandas datafram中按日期对齐行

3条回答

网友

1楼 · 编辑于 2024-06-16 09:46:34

我假设您只希望在Date1中日期匹配时保留['Date2'，'Log2']和['Date3'，'Log3']的值。在

您可以将不同的列读入不同的数据帧并使用merge。然后筛选以仅保留Date1列不为null的行。在

df
>>>
        Date1  Log1       Date2  Log2       Date3  Log3
0  01.01.2000  1000  02.01.2000  2000  01.01.2000  3000
1  02.01.2000  1050  03.01.2000  1950  02.01.2000  3020
2  03.01.2000  1100  04.01.2000  2000  03.01.2000  3000

df1 = df[['Date1', 'Log1']]
df2 = df[['Date2', 'Log2']]
df3 = df[['Date3', 'Log3']]

df_out = df1.merge(df2, how='outer', left_on='Date1', right_on='Date2')
df_out = df_out.merge(df3, how='outer', left_on='Date1', right_on='Date3')
df_out = df_out[df_out['Date1'].notnull()]

df_out
>>>
        Date1    Log1       Date2    Log2       Date3    Log3
0  01.01.2000  1000.0         NaN     NaN  01.01.2000  3000.0
1  02.01.2000  1050.0  02.01.2000  2000.0  02.01.2000  3020.0
2  03.01.2000  1100.0  03.01.2000  1950.0  03.01.2000  3000.0

网友

2楼 · 编辑于 2024-06-16 09:46:34

一个字典来表示你的数据，这只是一个方便的加载示例数据到dataframe。在

d = {'Date1': {0: '01.01.2000', 1: '02.01.2000', 2: '03.01.2000'}, 'Date3': {0: '01.01.2000', 1: '02.01.2000', 2: '03.01.2000'}, 'Date2': {0: '02.01.2000', 1: '03.01.2000', 2: '04.01.2000'}, 'Log2': {0: 2000, 1: 1950, 2: 2000}, 'Log3': {0: 3000, 1: 3020, 2: 3000}, 'Log1': {0: 1000, 1: 1050, 2: 1100}}
df = pd.DataFrame(d)
df = df[['Date1','Log1','Date2','Log2','Date3','Log3']]
df.index.names = ['Index']

print df

开始数据帧：

^{pr2}$

这很简陋，但能起作用：

list_dfs = []
for i in range(1,4):
    column_subset =  [col for col in df.columns if str(i) in col]
    df_subset_columns =  df[column_subset]
    df_subset_columns.columns = ['Date','Log']
    df_subset_columns['id'] = i
    list_dfs.append(df_subset_columns)

df =  pd.concat(list_dfs,axis=0,ignore_index=True)

df = df.set_index(['Date','id'])
df = df.unstack('id')
df.columns = df.columns.droplevel(0)

在这一点上，我认为这就是你所看到的逻辑：

id             1     2     3
Date                        
01.01.2000 1,000   nan 3,000
02.01.2000 1,050 2,000 3,020
03.01.2000 1,100 1,950 3,000
04.01.2000   nan 2,000   nan

但要恢复到期望的输出

list_dfs = []
for i in range(1,4):
    df_s = df[i].to_frame()
    df_s.columns = ['Log' + str(i)]
    print df_s
    list_dfs.append(df_s.reset_index())

print pd.concat(list_dfs,axis=1)

网友

3楼 · 编辑于 2024-06-16 09:46:34

使用list comprehension和^{}的解决方案，最后将^{}所有数据放在一起：

dates = [col for col in df.columns if 'Date' in col]
logs = [col for col in df.columns if 'Log' in col]

print ([df[[col[0], col[1]]].set_index(col[0], drop=False)
                            .reindex(df.Date1) for col in zip(dates, logs)])

[                 Date1  Log1
Date1                       
01.01.2000  01.01.2000  1000
02.01.2000  02.01.2000  1050
03.01.2000  03.01.2000  1100,                  Date2    Log2
Date1                         
01.01.2000         NaN     NaN
02.01.2000  02.01.2000  2000.0
03.01.2000  03.01.2000  1950.0,                  Date3  Log3
Date1                       
01.01.2000  01.01.2000  3000
02.01.2000  02.01.2000  3020
03.01.2000  03.01.2000  3000]

df1 = pd.concat([df[[col[0], col[1]]]
        .set_index(col[0], drop=False)
        .reindex(df.Date1) for col in zip(dates, logs)], axis=1)

df1.reset_index(inplace=True, drop=True)

print (df1)
        Date1  Log1       Date2    Log2       Date3  Log3
0  01.01.2000  1000         NaN     NaN  01.01.2000  3000
1  02.01.2000  1050  02.01.2000  2000.0  02.01.2000  3020
2  03.01.2000  1100  03.01.2000  1950.0  03.01.2000  3000

相关问题更多 >

编程相关推荐

热门问题

热门文章

在pandas datafram中按日期对齐行

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >