从一个项目中,我得到了一个如下所示的词典列表:
METTS MARK = {'salary': 365788, 'to_messages': 807, 'deferral_payments': 'NaN', 'total_payments': 1061827, 'exercised_stock_options': 'NaN', 'bonus': 600000, 'restricted_stock': 585062, 'shared_receipt_with_poi': 702, 'restricted_stock_deferred': 'NaN', 'total_stock_value': 585062, 'expenses': 94299, 'loan_advances': 'NaN', 'from_messages': 29, 'other': 1740, 'from_this_person_to_poi': 1, 'poi': False, 'director_fees': 'NaN', 'deferred_income': 'NaN', 'long_term_incentive': 'NaN', 'email_address': 'mark.metts@enron.com', 'from_poi_to_this_person': 38}
我要做的是取每个值,对其进行特征缩放,将“NaN”值替换为0,然后将其返回到字典中正确的位置。你知道吗
我尝试的代码如下所示:
加载包含数据集的字典
with open("final_project_dataset.pkl", "r") as data_file:
data_dict = pickle.load(data_file)
数据集中一个名为total的键正在创建一个清晰的异常值,所以我将其删除
del data_dict["TOTAL"]
直观地选择我的特征
my_features = [
'poi',
'salary',#
'bonus',#
'exercised_stock_options',#
'total_stock_value',#
'total_payments',
'expenses',
'loan_advances',#
'deferral_payments',
'deferred_income',
'restricted_stock',#
'restricted_stock_deferred',
'long_term_incentive',#
'shared_receipt_with_poi',#
#'from_this_person_to_poi',
#director_fees',
#'from_messages',
#'to_messages',
#'from_poi_to_this_person'
]
keys = data_dict.keys()
values = data_dict.values()
用0值替换NaN值
list_of_values = []
for key in keys:
tmp_list = []
for feature in my_features:
try:
data_dict[key][feature]
except KeyError:
print "error: key ", feature, " not present"
value = data_dict[key][feature]
if value=="NaN":
value = 0
tmp_list.append( float(value) )
list_of_values.append(tmp_list)
使用最小/最大缩放器缩放特征
from sklearn.preprocessing import MinMaxScaler
data_array = np.array(list_of_values)
scaler = MinMaxScaler()
rescaled_data = scaler.fit_transform(data_array)
所以,现在我有一个列表,如下所示:
[0. 0.32916568 0.075 0. 0.01279963 0.01025327 0.41221264 0. 0.01569801 1. 0.18366453 0.10365427 0. 0.12715088]
我想把这些重新缩放的值和相应的特征一起放到字典里。。。这是我写的代码:
my_data_dict = []
for key in keys:
key = {}
for x in range( len(rescaled_data) ):
for count in range( len(my_features) ):
key[ my_features[count] ] = rescaled_data[x][count]
my_data_dict.append(key)
但是我得到了一长串具有相同值的字典…例如:
{'salary': 0.24744478779905296, 'deferral_payments': 0.01569801010492397, 'total_payments': 0.01228550157492107, 'loan_advances': 0.0, 'bonus': 0.075, 'restricted_stock_deferred': 0.1036542684938879, 'total_stock_value': 0.016735894091266437, 'expenses': 0.550692201098954, 'exercised_stock_options': 0.011200759837784508, 'poi': 1.0, 'deferred_income': 1.0, 'shared_receipt_with_poi': 0.1583046549538127, 'restricted_stock': 0.17265209213492153, 'long_term_incentive': 0.01380311165200059}
{'salary': 0.24744478779905296, 'deferral_payments': 0.01569801010492397, 'total_payments': 0.01228550157492107, 'loan_advances': 0.0, 'bonus': 0.075, 'restricted_stock_deferred': 0.1036542684938879, 'total_stock_value': 0.016735894091266437, 'expenses': 0.550692201098954, 'exercised_stock_options': 0.011200759837784508, 'poi': 1.0, 'deferred_income': 1.0, 'shared_receipt_with_poi': 0.1583046549538127, 'restricted_stock': 0.17265209213492153, 'long_term_incentive': 0.01380311165200059}
如何从数据中提取密钥?重新缩放数据,并将其放在新的字典中?你知道吗
就像Joe Patten说的,熊猫让事情变得更简单,你可以把你的字典转换成数据帧,做你的事情,然后再转换回字典,如果你想的话:
完成后:
相关问题 更多 >
编程相关推荐