合并日期时间列上的数据(POSIXct格式)

2024-05-15 11:53:28 发布

您现在位置:Python中文网/ 问答频道 /正文

我想合并日期时间列上的两个数据帧数据类型日期-时间列包含相似和不同的值。但是我无法合并它们,所以所有唯一的日期时间行最终都在那里..NA在不常见的列中。 我在第二个数据帧的date\u time列中获取NAs。在R和python中都尝试过

python代码:

df=pd.merge(df_met, df_so2, how='left', on='Date_Time')

在R中,数据类型是使用作为.POSIXct

df_2<-join(so2, met_km, type="inner")
df3 <- merge(so2, met_km, all = TRUE)
df_4 <- merge(so2, met_km, by.x = "Date_Time", by.y = "Date_Time")

二氧化硫浓度:

 X  POC  Datum        Date_Time          Date_GMT  Sample.Measurement  MDL
 1    2  WGS84  2015-01-01 3:00  01/01/2015 09:00                 2.3  0.2
 2    2  WGS84  2015-01-01 4:00  01/01/2015 10:00                 2.5  0.2
 3    2  WGS84  2015-01-01 5:00  01/01/2015 11:00                 2.1  0.2
 4    2  WGS84  2015-01-01 6:00  01/01/2015 12:00                 2.3  0.2
 5    2  WGS84  2015-01-01 7:00  01/01/2015 13:00                 1.1  0.2

测向仪:

 X        Date_Time  air_temp_set_1  dew_point_temperature_set_1
 1  2015-01-01 1:00            35.6                         35.6
 2  2015-01-01 2:00            35.6                         35.6
 3  2015-01-01 3:00            35.6                         35.6
 4  2015-01-01 4:00            33.8                         33.8
 5  2015-01-01 5:00            33.2                         33.2
 6  2015-01-01 6:00            33.8                         33.8
 7  2015-01-01 7:00            33.8                         33.8

预期输出:

 X  POC    Datum        Date_Time          Date_GMT  Sample.Measurement  MDL
 1  1.0  2 WGS84  2015-01-01 3:00  01/01/2015 09:00                 2.3  0.2
 2  2.0  2 WGS84  2015-01-01 4:00  01/01/2015 10:00                 2.5  0.2
 3  NaN      NaN  2015-01-01 1:00               NaN                 NaN  NaN
 4  NaN      NaN  2015-01-01 2:00               NaN                 NaN  NaN

Tags: 数据dfdatebytime时间mergenan
3条回答
  • 不管是谁在读这篇文章,不要投反对票。我正在和OP一起解决他的错误,然后我们会删除这个答案。你知道吗

df_exp = pd.merge(df_so2, df_met, on='Date_Time', how='outer')

我得到了:

 POC   Datum        Date_Time           Date_GMT   Sample.Measurement   MDL   air_temp_set_1   dew_point_temperature_set_1   relative_humidity_set_1   wind_speed_set_1   cloud_layer_1_code_set_1   wind_direction_set_1   pressure_set_1d   weather_cond_code_set_1   visibility_set_1  wind_cardinal_direction_set_1d  weather_condition_set_1d
    2  WGS84   2015-01-01 3:00  01/01/2015 09:00                   2.3   0.2             35.6                          35.6                     100.0                0.0                       14.0                    0.0         29.943333                       9.0               0.25                              N                       Fog
    1  WGS84   2015-01-01 3:00  01/01/2015 09:00                   0.6   2.0             35.6                          35.6                     100.0                0.0                       14.0                    0.0         29.943333                       9.0               0.25                              N                       Fog
    1  WGS84   2015-01-01 3:00  01/01/2015 12:00                   7.4   0.2             35.6                          35.6                     100.0                0.0                       14.0                    0.0         29.943333                       9.0               0.25                              N                       Fog
    1  WGS84   2015-01-01 3:00  01/01/2015 10:00                   1.0   0.2             35.6                           NaN                       NaN                NaN                        NaN                    NaN               NaN                       NaN                NaN                             NaN                      NaN

注意事项:

  • 检查df_met.info()df_so2.info()并验证Date_Timenon-null datetime64[ns]
    • 如果没有,请尝试以下操作:
    • df_so2.Date_Time = pd.to_datetime(df_so2.Date_Time)
    • df_met.Date_Time = pd.to_datetime(df_met.Date_Time)
merge(df_so2, df_met, by = "Date_Time", all = T)

        Date_Time X.x POC Datum         Date_GMT Sample.Measurement MDL X.y air_temp_set_1 dew_point_temperature_set_1
1 2015-01-01 1:00  NA  NA  <NA>             <NA>                 NA  NA   1           35.6                        35.6
2 2015-01-01 2:00  NA  NA  <NA>             <NA>                 NA  NA   2           35.6                        35.6
3 2015-01-01 3:00   1   2 WGS84 01/01/2015 09:00                2.3 0.2   3           35.6                        35.6
4 2015-01-01 4:00   2   2 WGS84 01/01/2015 10:00                2.5 0.2   4           33.8                        33.8
5 2015-01-01 5:00   3   2 WGS84 01/01/2015 11:00                2.1 0.2   5           33.2                        33.2
6 2015-01-01 6:00   4   2 WGS84 01/01/2015 12:00                2.3 0.2   6           33.8                        33.8
7 2015-01-01 7:00   5   2 WGS84 01/01/2015 13:00                1.1 0.2   7           33.8                        33.8

在外部合并应该可以得到所有的结果:

  • ^{}
  • outer:使用来自两个帧的键的并集,类似于SQL完全外部连接;按字典顺序对键排序。你知道吗
  • 根据您的评论,您需要所有日期,而不仅仅是Expected Output中显示的日期
  • 如果要按date排序,请添加parametersort=True
df_exp = pd.merge(df_so2, df_met, on='Date_Time', how='outer')

 X_x  POC  Datum        Date_Time          Date_GMT  Sample.Measurement  MDL  X_y  air_temp_set_1  dew_point_temperature_set_1
 1.0  2.0  WGS84  2015-01-01 3:00  01/01/2015 09:00                 2.3  0.2    3            35.6                         35.6
 2.0  2.0  WGS84  2015-01-01 4:00  01/01/2015 10:00                 2.5  0.2    4            33.8                         33.8
 3.0  2.0  WGS84  2015-01-01 5:00  01/01/2015 11:00                 2.1  0.2    5            33.2                         33.2
 4.0  2.0  WGS84  2015-01-01 6:00  01/01/2015 12:00                 2.3  0.2    6            33.8                         33.8
 5.0  2.0  WGS84  2015-01-01 7:00  01/01/2015 13:00                 1.1  0.2    7            33.8                         33.8
 NaN  NaN    NaN  2015-01-01 1:00               NaN                 NaN  NaN    1            35.6                         35.6
 NaN  NaN    NaN  2015-01-01 2:00               NaN                 NaN  NaN    2            35.6                         35.6

没有来自df_met

的列
df_exp.drop(columns=['X_y', 'air_temp_set_1', 'dew_point_temperature_set_1'], inplace=True)
df_exp.rename(columns={'X_x': 'X'}, inplace=True)

   X  POC  Datum        Date_Time          Date_GMT  Sample.Measurement  MDL
 1.0  2.0  WGS84  2015-01-01 3:00  01/01/2015 09:00                 2.3  0.2
 2.0  2.0  WGS84  2015-01-01 4:00  01/01/2015 10:00                 2.5  0.2
 3.0  2.0  WGS84  2015-01-01 5:00  01/01/2015 11:00                 2.1  0.2
 4.0  2.0  WGS84  2015-01-01 6:00  01/01/2015 12:00                 2.3  0.2
 5.0  2.0  WGS84  2015-01-01 7:00  01/01/2015 13:00                 1.1  0.2
 NaN  NaN    NaN  2015-01-01 1:00               NaN                 NaN  NaN
 NaN  NaN    NaN  2015-01-01 2:00               NaN                 NaN  NaN

相关问题 更多 >