Python从datafram构建摘要数据帧

2024-04-25 21:18:14 发布

您现在位置:Python中文网/ 问答频道 /正文

我有这样一个数据帧

数据帧

Employee ID   A_ Status  C_Code  TestCol   Result_A  Result_B
20000         Yes        USA      asdasdq  True      False
20001         No         BRA      asdasdw  True      True
200002                   USA      asdasda  True      True 
200003        asda       MEX      asdasar  False     False

在这个数据帧中,结果\u A和结果\u B是布尔列。你知道吗

我想通过一个函数构建一个摘要数据帧,这样我就可以重用了。你知道吗

我需要数据框中的以下列,结果A的输出如下所示,结果B另一个布尔列将是摘要数据框的下一行。

 Name of the Column     No. of Records     No. of Employees    True_Records    False_Records     A_Status_Yes  A_Status_No     Mex_True      Mex_False      USA_True     USA_False
         Result_A              4               4                    3                     1                1            1               0            1              2              2  

还要注意的是,Employee ID有时可能是Employee ID或Employee\u ID或Employee\u ID

在实时中,我有25个数据帧,因此寻找一个可以重用和附加的函数。你知道吗

请帮帮我。你知道吗


Tags: of数据函数noidfalsetruestatus
1条回答
网友
1楼 · 发布于 2024-04-25 21:18:14

我想我得到了你想要的:

1-重新创建df:

df = pd.DataFrame({"Employee ID": [20000, 20001, 200002, 200003],
                  "A_ Status": ["Yes", "No", np.nan, "asda"],
                  "C_Code": ["USA", "BRA", "USA", "MEX"],
                  "TestCol": ["asdasdq", "asdasdw", "asdasda", "asdasar"],
                  "Result_A": [True, True, True, False],
                  "Result_B": [False, True, True, False]}, 
                  columns=["Employee ID", "A_ Status", "C_Code", "TestCol", "Result_A", "Result_B"])

2-创建第二个数据帧df2

df2 = pd.DataFrame(columns=["Name of the Column","No. of Records","No. of Employees","True_Records","False_Records","A_Status_Yes","A_Status_No","Mex_True","Mex_False","USA_True","USA_False"])

3-计算结果:

for column in df.columns[4:]: # For each columns of name pattern `Result_xx`
    print(column)
    a = [column,
        len(df["Employee ID"]), # Not sure about this one
        len(df["Employee ID"]),
        len(df[df[column] == True]),
        len(df[df[column] == False]),
        len(df[df["A_ Status"] == "Yes"]),
        len(df[df["A_ Status"] == "No"]),
        len(df[(df["C_Code"] == "MEX") & (df[column] == True)]),
        len(df[(df["C_Code"] == "MEX") & (df[column] == False)]),
        len(df[(df["C_Code"] == "USA") & (df[column] == True)]),
        len(df[(df["C_Code"] == "USA") & (df[column] == False)])
       ] # Create line as list

    df2.loc[len(df2), :] = a # Append line

4-结果:

+  +           +         +          +        +        -+        +       -+      +      -+      +      -+
|    | Name of the Column   |   No. of Records |   No. of Employees |   True_Records |   False_Records |   A_Status_Yes |   A_Status_No |   Mex_True |   Mex_False |   USA_True |   USA_False |
|  +           +         +          +        +        -+        +       -+      +      -+      +      -|
|  0 | Result_A             |                4 |                  4 |              3 |               1 |              1 |             1 |          0 |           1 |          2 |           0 |
|  1 | Result_B             |                4 |                  4 |              2 |               2 |              1 |             1 |          0 |           1 |          1 |           1 |
+  +           +         +          +        +        -+        +       -+      +      -+      +      -+

相关问题 更多 >