如何查找不同时出现在两个数据帧上的数据(特别是列名)?

2024-04-25 13:20:23 发布

您现在位置:Python中文网/ 问答频道 /正文

我有2个数据帧(足够大)。它们有列:Chu name,time,values。 现在,df(第一数据帧)有160个信道,而df1(第二数据帧)有161个信道。如何识别超出的通道名称(df没有此通道,但df1有)。有时两个数据帧之间的通道名称不匹配。最后,我想要两个数据帧(单独的)具有相同的频道名称和编号。我想为两个数据帧制作160个通道。我不想加入他们的行列。你知道吗

                        ch_name                time     value  interval
0     ADCS.Attitude_-_Omega_Y_deg_s 2019-01-05 00:00:00 -0.059533      60.0
1     ADCS.Attitude_-_Omega_Y_deg_s 2019-01-05 00:01:00 -0.063133      60.0
2     ADCS.Attitude_-_Omega_Y_deg_s 2019-01-05 00:02:00 -0.057533      60.0
3     ADCS.Attitude_-_Omega_Y_deg_s 2019-01-05 00:03:00 -0.053867      60.0
4     ADCS.Attitude_-_Omega_Y_deg_s 2019-01-05 00:04:00 -0.067067      60.0
...                             ...                 ...       ...       ...
1435             ADCS.RW3_MsmtValid 2019-01-05 23:55:00  1.000000      60.0
1436             ADCS.RW3_MsmtValid 2019-01-05 23:56:00  1.000000      60.0
1437             ADCS.RW3_MsmtValid 2019-01-05 23:57:00  1.000000      60.0
1438             ADCS.RW3_MsmtValid 2019-01-05 23:58:00  1.000000      60.0
1439             ADCS.RW3_MsmtValid 2019-01-05 23:59:00  1.000000      60.0

Tags: 数据name名称dftimedf1values信道
3条回答
Test data:

df1=pd.DataFrame({"ch_name":["ch1","ch1","ch2","ch3"],"time":pd.date_range("2019-10-12",freq="min",periods=4),"value":[1,2,3,4],"interval":[60,60,60,60]})                                                                                
df2=pd.DataFrame({"ch_name":["ch3","ch1","ch5","ch3"],"time":pd.date_range("2019-10-11",freq="min",periods=4),"value":[10,20,30,40],"interval":[70,70,70,70]})                                                                            
df1

  ch_name                time  value  interval
0     ch1 2019-10-12 00:00:00      1        60
1     ch1 2019-10-12 00:01:00      2        60
2     ch2 2019-10-12 00:02:00      3        60
3     ch3 2019-10-12 00:03:00      4        60

df2

  ch_name                time  value  interval
0     ch3 2019-10-11 00:00:00     10        70
1     ch1 2019-10-11 00:01:00     20        70
2     ch5 2019-10-11 00:02:00     30        70
3     ch3 2019-10-11 00:03:00     40        70

Create a set/array for the common column names:

#ss=set(df1.ch_name) & set(df2.ch_name)
ss= np.intersect1d(df1.ch_name,df2.ch_name)

Create the new data frames:

new_df1=df1[df1.ch_name.isin(ss)]

  ch_name                time  value  interval
0     ch1 2019-10-12 00:00:00      1        60
1     ch1 2019-10-12 00:01:00      2        60
3     ch3 2019-10-12 00:03:00      4        60

new_df3=df2[df2.ch_name.isin(ss)]

 ch_name                time  value  interval
0     ch3 2019-10-11 00:00:00     10        70
1     ch1 2019-10-11 00:01:00     20        70
3     ch3 2019-10-11 00:03:00     40        70

你试过合并吗?你知道吗

我创造了一些虚假的数据来告诉你它是如何工作的。你知道吗

        data1 = [['ch1','12:55',512],['ch2','12:57',516], ['ch3','12:57',420],
    ['ch4','11:48',433],['ch5','10:42',467]]

        data2 = [['ch1','12:55',511],['ch2','12:54',519],['ch3','12:51',720],
    ['ch5','10:45',497]]

        import pandas as pd

        df1 = pd.DataFrame(data1,columns=('ch_name','time','value'))
        df2 = pd.DataFrame(data2,columns=('ch_name','time','value'))

        dfResult = pd.merge(df1, df2, on='ch_name', how='inner')

        print(dfResult)

          ch_name time_x  value_x   time_y  value_y
     0     ch1    12:55      512     12:55      511
     1     ch2    12:57      516     12:54      519
     2     ch3    12:57      420     12:51      720
     3     ch5    10:42      467     10:45      497

这里的关键是参数how='inner'和on='ch\u name'。how='inner'表示希望列同时出现在dataframes中,on=column\u name表示定义要匹配的列的方式。你知道吗

希望对你有帮助。你知道吗

您可以通过设置逻辑来实现这一点:

df = pd.DataFrame(columns = ['a1', 'ba1', 'b1', 'c1'])
df1 = pd.DataFrame(columns = ['a1', 'gg', 'b1', 'c2'])

df
   a1   ba1 b1  c1

df1
    a1  gg  b1  c2

要从df中删除df1中的列,请执行以下操作:

df.drop(columns = set(df.columns) - set(df1.columns), inplace=True)
df
   a1   b1

相关问题 更多 >