如何遍历词典和列表以创建新词典

2024-04-18 13:30:38 发布

您现在位置:Python中文网/ 问答频道 /正文

从一个项目中,我得到了一个如下所示的词典列表:

METTS MARK = {'salary': 365788, 'to_messages': 807, 'deferral_payments': 'NaN', 'total_payments': 1061827, 'exercised_stock_options': 'NaN', 'bonus': 600000, 'restricted_stock': 585062, 'shared_receipt_with_poi': 702, 'restricted_stock_deferred': 'NaN', 'total_stock_value': 585062, 'expenses': 94299, 'loan_advances': 'NaN', 'from_messages': 29, 'other': 1740, 'from_this_person_to_poi': 1, 'poi': False, 'director_fees': 'NaN', 'deferred_income': 'NaN', 'long_term_incentive': 'NaN', 'email_address': 'mark.metts@enron.com', 'from_poi_to_this_person': 38}

我要做的是取每个值,对其进行特征缩放,将“NaN”值替换为0,然后将其返回到字典中正确的位置。你知道吗

我尝试的代码如下所示:

加载包含数据集的字典

with open("final_project_dataset.pkl", "r") as data_file:
    data_dict = pickle.load(data_file)

数据集中一个名为total的键正在创建一个清晰的异常值,所以我将其删除

del data_dict["TOTAL"]

直观地选择我的特征

my_features = [
    'poi',
    'salary',#
    'bonus',#
    'exercised_stock_options',#
    'total_stock_value',#
    'total_payments',
    'expenses',
    'loan_advances',#
    'deferral_payments',
    'deferred_income',
    'restricted_stock',#
    'restricted_stock_deferred',
    'long_term_incentive',#
    'shared_receipt_with_poi',#
    #'from_this_person_to_poi',
    #director_fees',
    #'from_messages',
    #'to_messages',
    #'from_poi_to_this_person'
]


keys = data_dict.keys()
values = data_dict.values()

用0值替换NaN值

list_of_values = []
for key in keys:
        tmp_list = []
        for feature in my_features:
            try:
                data_dict[key][feature]
            except KeyError:
                print "error: key ", feature, " not present"
            value = data_dict[key][feature]
            if value=="NaN":
                value = 0
            tmp_list.append( float(value) )
        list_of_values.append(tmp_list)

使用最小/最大缩放器缩放特征

from sklearn.preprocessing import MinMaxScaler
data_array = np.array(list_of_values)
scaler = MinMaxScaler()
rescaled_data = scaler.fit_transform(data_array)

所以,现在我有一个列表,如下所示:

[0. 0.32916568 0.075 0. 0.01279963 0.01025327 0.41221264 0. 0.01569801 1. 0.18366453 0.10365427 0. 0.12715088]

我想把这些重新缩放的值和相应的特征一起放到字典里。。。这是我写的代码:

my_data_dict = []
for key in keys:
    key = {}
    for x in range( len(rescaled_data) ):
        for count in range( len(my_features) ):
            key[ my_features[count] ] = rescaled_data[x][count]        
    my_data_dict.append(key)

但是我得到了一长串具有相同值的字典…例如:

{'salary': 0.24744478779905296, 'deferral_payments': 0.01569801010492397, 'total_payments': 0.01228550157492107, 'loan_advances': 0.0, 'bonus': 0.075, 'restricted_stock_deferred': 0.1036542684938879, 'total_stock_value': 0.016735894091266437, 'expenses': 0.550692201098954, 'exercised_stock_options': 0.011200759837784508, 'poi': 1.0, 'deferred_income': 1.0, 'shared_receipt_with_poi': 0.1583046549538127, 'restricted_stock': 0.17265209213492153, 'long_term_incentive': 0.01380311165200059}

{'salary': 0.24744478779905296, 'deferral_payments': 0.01569801010492397, 'total_payments': 0.01228550157492107, 'loan_advances': 0.0, 'bonus': 0.075, 'restricted_stock_deferred': 0.1036542684938879, 'total_stock_value': 0.016735894091266437, 'expenses': 0.550692201098954, 'exercised_stock_options': 0.011200759837784508, 'poi': 1.0, 'deferred_income': 1.0, 'shared_receipt_with_poi': 0.1583046549538127, 'restricted_stock': 0.17265209213492153, 'long_term_incentive': 0.01380311165200059}

如何从数据中提取密钥?重新缩放数据,并将其放在新的字典中?你知道吗


Tags: tokeyfromdatavaluemystocknan
1条回答
网友
1楼 · 发布于 2024-04-18 13:30:38

就像Joe Patten说的,熊猫让事情变得更简单,你可以把你的字典转换成数据帧,做你的事情,然后再转换回字典,如果你想的话:

import pandas as pd
from sklearn.preprocessing import MinMaxScaler

ser = pd.Series(METTS_MARK) #I am using your METTS_MARK

ser.replace('NaN',0,inplace=True)
ser.drop(index="email_address",inplace=True) #to make everything numerical so we can scale, you can add it back later

df = pd.DataFrame(ser)

scaler = MinMaxScaler()
df[0] = scaler.fit_transform(df)

完成后:

newDict = df[0].to_dict()

相关问题 更多 >