如何使用别名/查找表引用数据帧列

2024-06-11 14:56:45 发布

您现在位置:Python中文网/ 问答频道 /正文

我希望使用查找表或别名表来重命名列,以便在图形等中显示

  1. 我有一个从一个大的CSV文件加载的数据帧
  2. 我删除前几行,然后更正列名。(我知道这很混乱-谢谢你的建议)
  3. 但是,我不希望(有点程序化和丑陋的)列名出现在绘图等中
  4. 因此,我有一个文本文件,其中给出了列名、别名(以及需要的单位)
  5. 然后我想用相应的别名替换列名
  6. 或者,可以保留原始列名;只要我能在打印时显示别名。

我的原始CSV如下所示:

"TOA5","HE101_RV50_GAF","CR","7225","CR.Std.07","CPU:aa.CR6"
"TIMESTAMP","RECORD","SensorRelEventMin(1)","SensorRelEventMin(2)","SensorRelEventMin(3)","SensorRelEventMin(4)"
"TS","RN","","","",""
"","","Smp","Smp","Smp","Smp"
"2019-08-30 00:22:22.9",14546,-0.4051819,-0.2565842,-9.702911,-0.5374413
"2019-08-30 00:27:34.24",14547,-1.118546,-0.9480438,-5.356552,-1.204945
"2019-08-30 00:29:47.86",14548,-0.765564,-0.5029907,-7.062241,-0.8703575
"2019-08-30 00:35:36.76",14549,-0.7200012,-0.6559029,-6.257889,-0.6656723
"2019-08-30 00:42:28.56",14550,-0.6325226,-0.4022942,-4.179138,-0.4609756
"2019-08-30 00:48:55.32",14551,-0.4613953,-0.2666397,-4.391235,-0.4144287
"2019-08-30 00:52:15.74",14552,-0.4507446,-0.3086662,-1.869171,-0.5024986
"2019-08-30 01:02:15.04",14553,-0.5307922,-0.3815041,-5.40918,-0.3242683
"2019-08-30 01:09:18.38",14554,-0.6351166,-0.5765362,-2.261734,-0.4456367
"2019-08-30 01:11:07.38",14555,-0.2823181,-0.2864227,-0.2417603,-0.3462906
"2019-08-30 01:13:07.6",14556,-0.3824463,-0.3220673,-7.051376,-0.4786491

我编写了一个别名表,如下所示:

"EntryName","AliasedName","Units"
"TIMESTAMP","time","s"
"RECORD","record number",""
"SensorRelEventMin(1)","1st Sensor Name","uS"
"SensorRelEventMin(2)","2nd Sensor Name","uS"
"SensorRelEventMin(3)","3rd Sensor Name","uS"
"SensorRelEventMin(4)","4th Sensor Name","uS"

我希望df看起来像这样:

"time","record number","1st Sensor Name","2nd Sensor Name","3rd Sensor Name","4th Sensor Name"
"2019-08-30 00:22:22.9",14546,-0.4051819,-0.2565842,-9.702911,-0.5374413
"2019-08-30 00:27:34.24",14547,-1.118546,-0.9480438,-5.356552,-1.204945
"2019-08-30 00:29:47.86",14548,-0.765564,-0.5029907,-7.062241,-0.8703575
...

我的加载代码是:

# load data into df
df=pd.read_csv(filename, skiprows=3, na_values='NAN')
df.columns=["TIMESTAMP","RECORD","SensorRelEventMin(1)","SensorRelEventMin(2)","SensorRelEventMin(3)","SensorRelEventMin(4)"]
df=df.astype({'TIMESTAMP': 'datetime64'})
# read alias table
aliasTable = pd.read_csv(aliasTable.txt)

我想在伪代码中执行如下操作:

df.rename({aliasTable["EntryName"]:aliasTable["AliasedName"]})

或者,如果保留列名更有意义,那么用别名替换任何地物标题的简单方法也可以。我知道这是一个非常模糊的请求,但我的python功能已经到了极限


Tags: csvnamedfreadtimesensorrecordtimestamp
1条回答
网友
1楼 · 发布于 2024-06-11 14:56:45

如果我正确理解了这个问题,我们需要一个简单的解决方案来重命名dataframe列。如果确实如此,以下方法可能会有所帮助:

  1. 创建将旧列名映射到新列名的词典
  2. 在数据帧上使用rename方法更改列名

例如:

import pandas as pd
import numpy as np

# define a dictionary to replace column names in keys with column names in values
col_dict = {
    0: "col1",
    1: "col2"   
}

# create a dataframe. 0 and 1 are the default column names
df = pd.DataFrame(np.random.rand(4,2))

# print df
df

    0           1
0   0.433529    0.812580
1   0.116504    0.801236
2   0.236852    0.336812
3   0.415137    0.708668

# apply rename function over df 
df.rename(columns=col_dict, inplace=True)

# print df
df

    col1        col2
0   0.824290    0.306156
1   0.468152    0.809643
2   0.082632    0.114923
3   0.762481    0.360541

希望这有帮助

相关问题 更多 >