在时间序列中合并两个列名不同的数据框

0 投票
1 回答
28 浏览
提问于 2025-04-12 16:12

我有两个数据框,一个叫做sensor,另一个叫做train。

sensor数据框里存的是时间序列的数据,索引是ts_sensor这一列。在sensor数据框中,列的名字代表了传感器的名称,每个单元格里填的都是在ts_sensor时刻传感器测得的值。而train数据框里有一列叫id_sensor,里面填的是传感器的名称(和sensor数据框里的名称一样),还有一列叫boot_threshold,每个单元格里填的是对应传感器的阈值。

以下是关于这两个数据框的信息。

train.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 264 entries, 0 to 263
Data columns (total 17 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   id_train             264 non-null    object 
 1   asset_id             264 non-null    object 
 2   train_type           264 non-null    object 
 3   equipment_type       264 non-null    object 
 4   id_sensor            264 non-null    object 
 5   unit                 262 non-null    object 
 6   is_operating         262 non-null    float64
 7   will_be_available    262 non-null    float64
 8   pi_acquisition_date  0 non-null      float64
 9   pi_description       264 non-null    object 
 10  sensor_description   264 non-null    object 
 11  tag_id               262 non-null    float64
 12  priority             262 non-null    float64
 13  for_ml               264 non-null    int64  
 14  onoff_parameter      264 non-null    int64  
 15  boot_threshold       264 non-null    int64  
 16  efficiency_index     264 non-null    int64  
dtypes: float64(5), int64(4), object(8)
memory usage: 35.2+ KB

sensor.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1729930 entries, 0 to 1729929
Columns: 135 entries, level_0 to Australia.Blacktip.ICSS.KPI.EE.GasTurbineEfficiency_A
dtypes: float64(132), int64(2), object(1)
memory usage: 1.7+ GB

sensor.columns
Index(['level_0', 'index', 'sensor_ts',
       'Australia.Blacktip.ICSS.360140413-100.PV',
       'Australia.Blacktip.ICSS.360140415-100.PV',
       'Australia.Blacktip.ICSS.360140417-100.PV',
       'Australia.Blacktip.ICSS.360140421-100.PV',
       'Australia.Blacktip.ICSS.360140423-100.PV',
       'Australia.Blacktip.ICSS.360140425-100.PV',
       'Australia.Blacktip.ICSS.360140433-100.PV',
       ...
       'Australia.Blacktip.ICSS.3601PDI144.DACA.PV',
       'Australia.Blacktip.ICSS.3601PIT111.PV',
       'Australia.Blacktip.ICSS.3601PIT141.PV',
       'Australia.Blacktip.ICSS.3601TIT042.PV',
       'Australia.Blacktip.ICSS.3601TIT052.PV',
       'Australia.Blacktip.ICSS.3601TIT111.PV',
       'Australia.Blacktip.ICSS.3601TIT122.PV',
       'Australia.Blacktip.ICSS.3601TIT141.PV',
       'Australia.Blacktip.ICSS.3601TIT152.PV',
       'Australia.Blacktip.ICSS.KPI.EE.GasTurbineEfficiency_A'],
      dtype='object', length=135)

我想得到一个新的数据框,这个数据框要保留时间序列,并且只包含那些当前ts_sensor值高于对应boot_threshold值的传感器。

谢谢大家!

1 个回答

0

你是在找这个吗?

import pandas as pd
import numpy as np   
# Simulated sensor data (a small part of the actual data for demonstration)
sensor_data = {
    'sensor_ts': pd.date_range(start='2023-01-01', periods=4, freq='D'),
    'Australia.Blacktip.ICSS.360140413-100.PV': [100, 200, 300, 400],
    'Australia.Blacktip.ICSS.360140415-100.PV': [100, 220, 330, 440],
}

sensor = pd.DataFrame(sensor_data)

# Simulated train data 
train_data = {
    'id_sensor': ['Australia.Blacktip.ICSS.360140413-100.PV', 'Australia.Blacktip.ICSS.360140415-100.PV'],
    'boot_threshold': [250, 300],
}

train = pd.DataFrame(train_data)

# Reshape sensor DataFrame from wide to long format
sensor_long = pd.melt(sensor, id_vars=['sensor_ts'], var_name='id_sensor', value_name='value')

# Merge the reshaped sensor data with the train data
merged_data = pd.merge(sensor_long, train, on='id_sensor')

# Filter to include only rows where sensor value is greater than the boot_threshold
filtered_data = merged_data[merged_data['value'] > merged_data['boot_threshold']]

# Displaying the filtered data for verification
filtered_data

输出结果:

sensor_ts   id_sensor   value                      boot_threshold
2   2023-01-03  Australia.Blacktip.ICSS.360140413-100.PV    300 250
3   2023-01-04  Australia.Blacktip.ICSS.360140413-100.PV    400 250
6   2023-01-03  Australia.Blacktip.ICSS.360140415-100.PV    330 300
7   2023-01-04  Australia.Blacktip.ICSS.360140415-100.PV    440 300

撰写回答