如何混合groupby.sum()的结果

2024-06-16 12:11:14 发布

您现在位置:Python中文网/ 问答频道 /正文

我得到了一些防火墙日志并分析了它

我要混合两个groupby.sum()结果

这是我的密码

    def analysis(data_location, col_name):


    DATA_OPEN = open(data_location, "r")
    DATA = DATA_OPEN.readlines()
    DATA_OPEN.close()
    df = []

    for data in DATA:

        data = data.rstrip("\n")
        data = data.split()
        df.append({"Firewall":data[0], "Gatway":data[1], "DATE":data[2],
                   "Rule_name":data[3], col_name:data[4], "Count":int(data[5])})




    df = pd.DataFrame(df)

    df = df[["Firewall", "Gatway", "DATE", "Rule_name", col_name, "Count"]]
    df = df.groupby(["Firewall", "Gatway", "DATE", "Rule_name", col_name])
    print(df.sum().reset_index())

这个结果呢

    DST = analysis("united_temp_fw_dst_log.txt", "dst")

    """the result
                                                      Count
    Firewall   Gatway DATE    Rule_name  dst                   
    10_1_81_34 vsys1  2019104 allow_Drop 10.1.81.255         34
                                         10.255.63.18        16
                                         103.226.213.30       4
                                         129.146.178.96     282
                                         183.177.72.201       4
                                         183.177.72.202       4
                                         220.133.209.243      4
                                         8.8.8.8            597"""


    SRC = analysis("united_temp_fw_src_log.txt", "src")
    """the result
                                                          Count
    Firewall   Gatway DATE    Rule_name  src               
    10_1_81_34 vsys1  2019104 allow_Drop 10.1.81.10       8
                                         10.1.81.11      12
                                         10.1.81.115     11
                                         10.1.81.118      3
                                         10.1.81.245    911"""

我想用[“Firewall”,“Gatway”,“DATE”,“Rule\u name”]这样作为索引和列

    Firewall   Gatway DATE    Rule_name  src          count     dst             count
    10_1_81_34 vsys1  2019104 allow_Drop 10.1.81.10       8    10.1.81.255         34
                                         10.1.81.11      12    10.255.63.18        16
                                         10.1.81.115     11    103.226.213.30       4
                                         10.1.81.118      3    129.146.178.96     282
                                         10.1.81.245    911    183.177.72.201       4
                                                               183.177.72.202       4
                                                               220.133.209.243      4 
                                                               8.8.8.8            597

我该怎么办?我试过重置索引()和groupby(),但这不是我想要的答案


Tags: namesrcdfdatadatecountcolanalysis
2条回答

您能更改列名吗?这样您就不会有重复的列名了(在您的例子中可以计数)?如果是,我将使用concat函数:

#generate simpler version of your dataframe
df=pd.DataFrame({'Firewall':['10_1_81_34','10_1_81_34','10_1_81_34'],
         'Gatway':['vsys1','vsys1','vsys1'],
         'dst':['10.1.81.255','10.255.63.18','103.226.213.30'],
         'count_dst':[34,16,4]})
df.set_index(['Firewall','Gatway'],inplace=True)
df2=pd.DataFrame({'Firewall':['10_1_81_34','10_1_81_34','10_1_81_34'],
         'Gatway':['vsys1','vsys1','vsys1'],
         'src':['10.1.81.10','10.1.81.11','10.1.81.115'],
         'count_src':[8,12,11]})
df2.set_index(['Firewall','Gatway'],inplace=True)

#Concatenate dataframes along columns
df3=pd.concat([df,df2],axis=1)

使用pd.concat,我得到以下输出:

                              dst  count_dst          src  count_src
Firewall   Gatway                                                   
10_1_81_34 vsys1      10.1.81.255         34   10.1.81.10          8
           vsys1     10.255.63.18         16   10.1.81.11         12
           vsys1   103.226.213.30          4  10.1.81.115         11

编辑使用不同长度的数据帧:

#generate simpler version of your dataframe
df=pd.DataFrame({'Firewall':['10_1_81_34','10_1_81_34'],
         'Gatway':['vsys1','vsys1'],
         'dst':['10.1.81.255','10.255.63.18'],
         'count_dst':[34,16]})
df2=pd.DataFrame({'Firewall':['10_1_81_34','10_1_81_34','10_1_81_34'],
         'Gatway':['vsys1','vsys1','vsys1'],
         'src':['10.1.81.10','10.1.81.11','10.1.81.115'],
         'count_src':[8,12,11]})

#Concatenate dataframes along columns
df3=pd.concat([df,df2],axis=1)
#Remove duplicated columns
df3.Firewall=df3.Firewall.dropna(axis=1)
df3.Gatway=df3.Gatway.dropna(axis=1)
df3=df3.loc[:,~df3.columns.duplicated()]

#set index
df3.set_index(['Firewall','Gatway'],inplace=True)

这是输出:

                            dst  count_dst          src  count_src
Firewall   Gatway                                                 
10_1_81_34 vsys1    10.1.81.255       34.0   10.1.81.10          8
           vsys1   10.255.63.18       16.0   10.1.81.11         12
           vsys1            NaN        NaN  10.1.81.115         11

一个简单的连接就可以做到:

DST.join(SRC)

相关问题 更多 >