Python或Pandas数据摘要(将表转换为行名称:[列名称,值]…]的字典)

2024-06-16 14:08:32 发布

您现在位置:Python中文网/ 问答频道 /正文

请查找以下数据,其中索引名称重复了3到5次,因为月报在提交给我之前未正确合并3个月

Name    Score   Rating
Peralta 0   40
Peralta 20  0
Peralta 0   0
Amy 0   40
Amy 20  40
Amy -20 40
Terry   0   0
Terry   -20 40
Terry   0   -40
Gina    20  0
Gina    0   0
Gina    -20 40

到目前为止,我只提取了最后代码中所示的列名。但我无法将的“列名”和的“单元格值”放在一起

我需要以以下格式总结数据: (根据您的喜好,结果可以有制表符或逗号。)

### Final Result
# IndexName [col_name, cell_value]  [sum of positive numbers, result] [sum of negative numbers, result]


Peralta [Rating, 40]    [Score, 20]     [Sum_Total_of_positive_numbers, 60]
Amy     [Rating, 40]    [Score, 20]     [Rating, 40]    [Score, -20]    [Rating, 40]    [Sum_Total_of_positive_numbers, 140]    [Sum_Total_of_negative_numbers, -20]
Terry   [Score, -20]    [Rating, 40]    [Rating, -40]   [Sum_Total_of_positive_numbers, 40] [Sum_Total_of_negative_numbers, -60]
Gina    [Score, 20]     [Score, -20]    [Rating, 40]    [Sum_Total_of_positive_numbers, 60] [Sum_Total_of_negative_numbers, -20]

所有用于汇总数据的标准函数都没有帮助,和/或以上述格式提供汇总的其他库不存在

下面的代码实际上有助于降低列名,但是与该列+行相关的单元格的值没有出现

for k, v in dff_dict.items():               # k: name of index, v: is a df
    check = v.columns[(v == 20).any()]
    if len(check) > 0:
        print((k, check.to_list()), file=open("output.txt", "a"))

有什么方法可以达到第二张表中所示的“最终结果”?(最终结果不一定是任何特定格式,如数据帧或表格) 谢谢

注:

The original data may contain more than 20 columns hundreds of rows, but I have simplified the data, so pleassse do not limit to only a few rows or columns.'

Also the last 2 bracketed value shows the sum total of all the positive and negative numbers in the row, after bringing down the values in place. Positive numbers and negative numbers separately.

There is a helpful article here (Closest Possible answer), but then I need [column_name, value] unlike just the [row:value]


Tags: ofthe数据valuetotalterryscoresum
1条回答
网友
1楼 · 发布于 2024-06-16 14:08:32

我认为如果您可以将整个数据集分成两部分,考虑到“正数和负数分别求和”的要求,这会更容易

从您的示例数据开始:

import pandas as pd
import numpy as np
data = [
{"Name": "Peralta", "Score": 0, "Rating": 40},
{"Name": "Peralta", "Score": 20, "Rating": 0},
{"Name": "Peralta", "Score": 0, "Rating": 0},
{"Name": "Amy", "Score": 0, "Rating": 40},
{"Name": "Amy", "Score": 20, "Rating": 40},
{"Name": "Amy", "Score": -20, "Rating": 40},
{"Name": "Terry", "Score": 0, "Rating": 0},
{"Name": "Terry", "Score": -20, "Rating": 40},
{"Name": "Terry", "Score": 0, "Rating": -40},
{"Name": "Gina", "Score": 20, "Rating": 0},
{"Name": "Gina", "Score": 0, "Rating": 0},
{"Name": "Gina", "Score": -20, "Rating": 40},
]
df = pd.DataFrame(data).set_index("Name")

我们可以得到正负值的预测:

df_pos = df.where(df>=0, other=0)
df_neg = df.where(df<0, other=0)

然后分组求和,以获得您想要的结果:

df_pos = df_pos.groupby(by="Name").sum()
df_pos["total_positive"] = df_pos.apply(np.sum, axis=1)

df_neg = df_neg.groupby(by="Name").sum()
df_neg["total_negative"] = df_neg.apply(np.sum, axis=1)

注意-在这个阶段,数据仍然在两个数据帧中,没有展平到您显示的[field, value]格式

相关问题 更多 >