Python迭代多个数据帧

2024-06-16 10:22:49 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图重命名多个数据帧中的列,并将这些列转换为整数。这是我的代码:

def clean_col(df,col_name):
    df.reset_index(inplace=True)
    df.rename(columns={df.columns[0]:'Date', df.columns[1]: col_name},inplace=True)
    df[col_name]=df[col_name].apply(lambda x: int(x))

我有一个dataframe名称字典和列的新名称:

d = {
    all_df: "all",
    coal_df: "coal",
    liquids_df: "liquids",
    coke_df: "coke",
    natural_gas_df: "natural_gas",
    nuclear_df: "nuclear",
    hydro_electricity_df: "hydro",
    wind_df: "wind",
    utility_solar_df: "utility_solar",
    geothermal_df: "geo_thermal",
    wood_biomass_df: "biomass_wood",
    biomass_other_df: "biomass_other",
    other_df: "other",
    solar_all_df: "all_solar",
}
for i, (key, value) in enumerate(d.items()):
    clean_col(key, value)

这就是我得到的错误:

TypeError: 'DataFrame' objects are mutable, thus they cannot be hashed

任何帮助都将不胜感激


Tags: columnsname名称cleantruedfcolall
3条回答

我只是创建了两个不同的列表,然后遍历了数据帧列表和新列名列表

def clean_col(df,col_name):
    df.reset_index(inplace=True)
    df.rename(columns={df.columns[0]:'Date', df.columns[1]: col_name},inplace=True)
    df[col_name]=df[col_name].apply(lambda x: int(x))
list_df=[all_df, coal_df, liquids_df, coke_df, natural_gas_df, nuclear_df, hydro_electricity_df, wind_df, utility_solar_df, geothermal_df, wood_biomass_df, biomass_other_df, other_df, solar_all_df]                
list_col=['total', 'coal' , 'liquids' , 'coke' , 'natural_gas', 'nuclear', 'hydro','wind','utility_solar', 'geo_thermal', 'biomass_wood',   'biomass_other', 'other','all_solar']
for a,b in zip(list_df,list_col):
    clean_col(a,b)

通过使用字典链接新旧列名,您走上了正确的道路。如果在数据帧列表中循环;然后循环浏览新的列名字典,这样就行了

df1 = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
df2 = pd.DataFrame({"A": [1, 2, 3], "D": [4, 5, 6], "F": [4, 5, 6]})
all_dfs = [df1, df2]

display(df1)
display(df2)

enter image description here

d = {
    "A": "aaaaa",
    "D": "ddddd",
}
for df in all_dfs:
    for col in d:
        if col in df.columns:
            df.rename(columns={col: d.get(col)}, inplace=True)

display(df1)
display(df2)

enter image description here

使用全局变量(或局部变量)

import pandas as pd
import io

data1 = '''id,name
1,A
2,B
3,C
4,D
'''
data2 = '''id,name
1,W
2,X
3,Y
4,Z
'''

df1 = pd.read_csv(io.StringIO(data1))
df2 = pd.read_csv(io.StringIO(data2))


def clean_function(dfname, col_name):
    df = globals()[dfname]   # also see locals()
    df.rename(columns={df.columns[0]:'NewID', df.columns[1]: col_name},inplace=True)
    return df

mydict = { 'df1': 'NewName', 'df2': 'AnotherName'}

for k,v in mydict.items():
    df = clean_function(k,v)
    print(df)

输出:

   NewID NewName
0      1       A
1      2       B
2      3       C
3      4       D
   NewID AnotherName
0      1           W
1      2           X
2      3           Y
3      4           Z

相关问题 更多 >