KeyError: "['建筑年龄', '楼层', '楼层数量']不在索引中
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
import category_encoders as ce
# Read the data
transactions_master_df = pd.read_csv('my_data.csv')
# Calculate the average house price for each district
avg_price_per_district = transactions_master_df.groupby('District')['Price'].mean().reset_index()
avg_price_per_district.rename(columns={'Price': 'AvgPrice'}, inplace=True)
#print the average price for each district with the district column next to it
print(avg_price_per_district)
# Merge the average price information with the original DataFrame
transactions_master_df = pd.merge(transactions_master_df, avg_price_per_district, on='District', how='left')
# Binary encode the 'District' feature
encoder = ce.BinaryEncoder(cols=['District'], base=6)
transactions_encoded = encoder.fit_transform(transactions_master_df)
# Concatenate additional features to the encoded DataFrame
additional_features = ['Building Age', 'Floor', 'Number of Floors', 'Elevator',
'number of bathrooms', 'Otopark', 'steeped alley',
'material used and luxuriness', 'view',
'prestige of that district and its vicinity']
# Check if additional features are present in the transactions_encoded DataFrame
for feature in additional_features:
if feature not in transactions_encoded.columns:
print(f"Warning: {feature} column not found in transactions_encoded DataFrame.")
# Concatenate additional features to the encoded DataFrame
final_features = pd.concat([transactions_encoded[['District_0', 'District_1', 'District_2', 'SquareMeter']],
transactions_encoded[additional_features]], axis=1)
# Ensure 'final_features' contains the necessary columns for training
print(final_features.head())
你好,
在这段代码中,我正在为我的房价数据集构建一个模型。首先,我对一些非数字特征进行了编码,然后在我把其他特征合并到最终特征变量时,出现了以下错误:
final_features = pd.concat([transactions_encoded[['District_0', 'District_1', 'District_2', 'SquareMeter']],
---> 38 transactions_encoded[additional_features]], axis=1)
KeyError: "['Building Age', 'Floor', 'Number of Floors'] not in index"
奇怪的是,这些特征在我的数据集中确实存在,但我不知道为什么会给我这个错误。
1 个回答
0
看起来你的数据表中的列名‘建筑年龄’、‘楼层’和‘楼层数量’里有多余的空格。这些多余的空格让程序在找这些列名的时候出错,导致了KeyError(键错误)。