我需要关联三个数据库,telemetry3h、telemetry24h和error_coun
我不知道如何将pandas的df.corr()
函数与问题联系起来,我无法编写这部分代码
我需要得到这个结果,但我找不到一个有效的函数,有人有什么建议吗
最后是通过代码中已有的函数打印的两行数据
import pandas as pd
telemetry = pd.read_csv('D:/IC/teste_disseracao/banco_de_dados_PdM/PdM_telemetry.csv')
telemetry['datetime'] = pd.to_datetime(telemetry['datetime'], format="%Y-%m-%d %H:%M:%S")
#valores da telemetria no com intervalo de 3 em 3 horas (media)
temp = []
fields = ['volt', 'rotate', 'pressure', 'vibration']
for col in fields:
temp.append(pd.pivot_table(telemetry,
index='datetime',
columns='machineID',
values=col).rolling(window=3).mean().resample('3H',
closed='left',
label='right').mean().unstack())
telemetrynew_media = pd.concat(temp, axis=1)
telemetrynew_media.columns = [i + 'mean_3h' for i in fields]
telemetrynew_media.reset_index(inplace=True)
#valores da telemetria no com intervalo de 3 em 3 horas (desvio padrao)
temp = []
fields = ['volt', 'rotate', 'pressure', 'vibration']
for col in fields:
temp.append(pd.pivot_table(telemetry,
index='datetime',
columns='machineID',
values=col).rolling(window=3).std().resample('3H',
closed='left',
label='right').std().unstack())
telemetrynew_dp = pd.concat(temp, axis=1)
telemetrynew_dp.columns = [i + 'sd_3h' for i in fields]
telemetrynew_dp.reset_index(inplace=True)
telemetry3h = telemetrynew_media
#valores da telemetria no com intervalo de 24 em 24 horas (media)
temp = []
fields = ['volt', 'rotate', 'pressure', 'vibration']
for col in fields:
temp.append(pd.pivot_table(telemetry,
index='datetime',
columns='machineID',
values=col).rolling(window=24).mean().resample('3H',
closed='left',
label='right').mean().unstack())
telemetrynew_media = pd.concat(temp, axis=1)
telemetrynew_media.columns = [i + 'mean_24h' for i in fields]
telemetrynew_media.reset_index(inplace=True)
telemetrynew_media = telemetrynew_media.loc[-telemetrynew_media['voltmean_24h'].isnull()]
#valores da telemetria no com intervalo de 24 em 24 horas (desvio padrao)
temp = []
fields = ['volt', 'rotate', 'pressure', 'vibration']
for col in fields:
temp.append(pd.pivot_table(telemetry,
index='datetime',
columns='machineID',
values=col).rolling(window=24).std().resample('3H',
closed='left',
label='right').std().unstack())
telemetrynew_dp = pd.concat(temp, axis=1)
telemetrynew_dp.columns = [i + 'sd_24h' for i in fields]
telemetrynew_dp.reset_index(inplace=True)
telemetrynew_dp = telemetrynew_dp.loc[-telemetrynew_dp['voltsd_24h'].isnull()]
telemetry24h = telemetrynew_media
errors = pd.read_csv('D:/IC/teste_disseracao/banco_de_dados_PdM/PdM_errors.csv')
errors['datetime'] = pd.to_datetime(errors['datetime'], format="%Y-%m-%d %H:%M:%S")
#uma coluna para cada tipo de erro
error_count = pd.get_dummies(errors)
error_count.columns = ['datetime', 'machineID','volt', 'rotate', 'pressure', 'vibration']
#combinar erros para uma determinada máquina em uma determinada hora
error_count = error_count.groupby(['machineID', 'datetime']).sum().reset_index()
machineID datetime voltmean_24h rotatemean_24h pressuremean_24h vibrationmean_24h
1 2015-01-02 06:00:00 169.73380889577277 445.17986460181123 96.79711296200088 40.38515998915177
1 2015-01-02 09:00:00 170.7007271750606 445.2941115744456 97.23217240183708 39.64764312098307
machineID datetime voltmean_3h rotatemean_3h pressuremean_3h vibrationmean_3h
1 2015-09-02 00:00:00 165.24762959802376 445.7493967947973 90.03620169154418 63.43759320078462
1 2015-09-01 21:00:00 169.3392785585131 447.60737912233566 98.0559196736471 63.0969156902752
datetime machineID errorID_error1 errorID_error2 errorID_error3 errorID_error4 errorID_error5
2015-01-03 07:00:00 1 1 0 0 0 0
2015-01-27 04:00:00 1 1 0 0 0 0
我认为您正在寻找的是一种连接列的方法,然后找到数据帧中每个列之间的相关性,对吗
我像这样加载了您的样本:
然后,您可以为索引列指定一个名称,删除不想关联的列,然后连接索引上的其余列。像这样:
现在,剩下要做的就是调用correlation方法
我不知道你的数据,但我看到你有一些时间戳。也许你必须想出一个更聪明的索引机制(或者更好的一个键),它能考虑到你的记录是在什么时候被获取的。例如,您可能希望一行只对应一天
相关问题 更多 >
编程相关推荐