从另一个数据帧更改数据帧中的值

2024-05-16 18:41:32 发布

您现在位置:Python中文网/ 问答频道 /正文

考虑下面的数据帧的名称为“^ {< CD1>}”。

   Index   date            colum1          column2
      0       20200705        a              1.1%
      1       20200706        b              78%
      2       20200707        f              10%
      3       20200707        g              59%
      4       20200708        a              69%
      5       20200708        g              12%

考虑下面的数据帧的名称为“^ {< CD2>}”。

     Index   date            colum1          column2
      0       20200707        q              11%
      1       20200707        w              54%
      2       20200708        e              64%
      3       20200708        r              11%

我想使用日期列作为条件,从“df2”更新“df1”。下面的数据框是我想要的输出

  Index   date            colum1          column2
      0       20200705        a              1.1%
      1       20200706        b              78%
      2       20200707        q              11%
      3       20200707        w              54%
      4       20200708        e              64%
      5       20200708        r              11%

Tags: 数据名称dateindex条件df1df2cd1
3条回答

使用:

df1a = df1[~df1['date'].isin(df2['date'].tolist())].copy() # Selects dates which are not in df2
df2a = df2[~df2['date'].isin(df1a['date'].tolist())].copy() # Selects dates which are not in df1a
df3 = df1a.append(df2a)

要执行更新,仅按日期对齐行是不够的。 对齐应通过“日期索引”进行“扩展”-连续 每个日期的编号

要计算这两个源数据帧的“日期索引”,请运行:

df['dateInd'] = df.groupby('date').cumcount()
df2['dateInd'] = df2.groupby('date').cumcount()

然后执行实际更新并删除上面的附加列:

df.set_index(['date', 'dateInd'], inplace=True)
df.update(df2.set_index(['date', 'dateInd']))
df.reset_index(level=1, drop=True, inplace=True)
df.reset_index(inplace=True)

结果是:

       date colum1 column2
0  20200705      a    1.1%
1  20200706      b     78%
2  20200707      q     11%
3  20200707      w     54%
4  20200708      e     64%
5  20200708      r     11%

下面的代码假设数据帧较小,否则需要搜索不同的方法

import pandas as pd

df1 = pd.DataFrame(data={
    'date':['2020070{}'.format(i) for i in [5,6,7,7,8,8]],
    'column1':['a', 'b','f','g','a','g'],
    'column2':['{}%'.format(i) for i in [1.1,78,10,59,69,12]]
})

df2 = pd.DataFrame(data={
    'date':['2020070{}'.format(i) for i in [7,7,8,8]],
    'column1':['q','w','e','r'],
    'column2':['{}%'.format(i) for i in [11,54,64, 11]]
})

dates_not_in_2 = [x for x in df1['date'] if x not in df2['date'].tolist()]
dates_common = [x for x in df1['date'] if x in df2['date'].tolist()]

combined = pd.concat([df1.loc[df1.date.isin(dates_not_in_2)], df2.loc[df2.date.isin(dates_common)]], axis=0).reset_index(drop=True)

相关问题 更多 >