为每一行计算顶部X%内有多少列

ID v1 v2 v3 1: a 1 2 0 2: b 2 3 0 3: c 1 6 1 4: d 3 1 2 5: e 4 0 3 6: f 5 2 5 # set up a reproducible example library(data.table) df = data.table(ID = c('a', 'b', 'c', 'd', 'e', 'f'), v1 = c(1,2,1,3,4,5), v2 = c(2,3,6,1,0,2), v3 = c(0,0,1,2,3,5)) # function to find out the outliers outlier_detector = function(x, type = 'positive',tail = 0.05) { if (type == 'positive') { x >= quantile(x, 1 - tail) } else if (type == 'negative') { x <= quantile(x, tail) } } # add two columns to the original dataset # sum_out_positive - for each row calculates the number of columns where within top 5% # sum_out_negative - for each row calculates the number of columns where within bottom 5% df[,`:=`( sum_out_positive = df[,2:4][ , lapply(.SD, outlier_detector)][ , rowSums(.SD, na.rm = T), .SDcols = paste0('v', 1:3)], sum_out_negative = df[, 2:4][ , lapply(.SD, outlier_detector, 'negative')][ , rowSums(.SD, na.rm = T), .SDcols = paste0('v', 1:3)])]

ID v1 v2 v3 sum_out_positive sum_out_negative 1: a 1 2 0 0 2 2: b 2 3 0 0 1 3: c 1 6 1 1 1 4: d 3 1 2 0 0 5: e 4 0 3 0 1 6: f 5 2 5 2 0

1条回答

网友

1楼 · 发布于 2024-06-16 13:05:11

这里有一个解决方案，也许不是最优雅的方式，或者最理想的方式，但它是有效的。希望能有帮助：

# For each value column, indicate the outliers
for col in df.columns[1:]:
    df[f'{col}_outliers_pos'] = np.where(df[col] >= df[col].quantile(0.95), 1, 0)
    df[f'{col}_outliers_neg'] = np.where(df[col] <= df[col].quantile(0.05), 1, 0)

# Create lists for positive and negative columns 
pos_cols = [col for col in df.columns if 'pos' in col]
neg_cols = [col for col in df.columns if 'neg' in col]

# Calculate the sum of both negative and positive
df['sum_out_positive'] = df[pos_cols].sum(axis=1)
df['sum_out_negative'] = df[neg_cols].sum(axis=1)

# Drop columns we dont need to get correct output
df.drop(pos_cols + neg_cols, axis=1, inplace=True)

print(df)
  ID  v1  v2  v3  sum_out_positive  sum_out_negative
0  a   1   2   0                 0                 2
1  b   2   3   0                 0                 1
2  c   1   6   1                 1                 1
3  d   3   1   2                 0                 0
4  e   4   0   3                 0                 1
5  f   5   2   5                 2                 0

相关问题更多 >

编程相关推荐

热门问题

热门文章