我有一个pd dataframe,它有多个列,如(简化以便于阅读)-每一行由一个id(uuid)、索引和一个或多个功能组成:
uuid index Atrium Ventricle
di-abc 0 20.73 26.21
di-abc 1 18.92 25.14
di-efg 7 19.02 0.30
di-efg 9 1.23 0.51
di-efg 6 21.24 26.02
di-hjk 3 22.10 25.16
di-hjk 6 19.16 25.57
我想:
outliers = {
'Atrium' : [
{'uuid' : 'di-efg', 'index' : 9, 'value' : 1.23},
],
'Ventricle' : [
{'uuid' : 'di-efg', 'index' : 7, 'value' : 0.30},
{'uuid' : 'di-efg', 'index' : 9, 'value' : 0.53},
]
}
注意事项(处理此问题的额外积分):
我在双for循环之外的两个步骤都有困难。 有没有一种有效的方法来计算这个数据帧中的异常值
以下是一种有效的方法,用于捕获我试图实现的目标:
# initialize variables:
outliers = {}
features = ['Atrium', 'Ventricle']
# iterate over each feature:
for feature in features:
# set feature on outlier to empty list:
outliers[feature] = []
# create a dataframe of outliers for that specific feature:
outlier_df = df[df[feature] > (df[feature].mean() + df[feature].std())] # can mess with this if needed
outlier_df = outlier_df[['dicom', 'frame', 'index', feature]]
# iterate through the data frame and find the uuid, index, and feature:
for index, row in outlier_df.iterrows():
# append each outlier to the outlier dictionary:
outliers[feature].append({
'uuid' : row['uuid'],
'index' : row['index'],
'value' : row[feature],
})
下面是解决问题的一种方法,定义一个函数,该函数将输入参数作为列名,并以所需格式返回当前列中的所有异常值:
替代方法更多地涉及pandas操作,如
stacking
、grouping
和aggregation
:相关问题 更多 >
编程相关推荐