使用Pandas聚合JSON对象中的类似字段

2024-04-27 23:09:03 发布

您现在位置:Python中文网/ 问答频道 /正文

我有JSON字符串(请参阅代码中的内容):

import numpy as np
import pandas as pd

x = [{
            "t_registration_number": "31807380529",
            "t_customer_inn": "2221123815",
            "t_customer_kpp": "222101001",
            "t_customer_ogrn": "1072221001709",
            "t_customer_short_name": "КАУ ГОСУДАРСТВЕННАЯ ЭКСПЕРТИЗА АЛТАЙСКОГО КРАЯ",
            "t_placer_inn": "2221123815",
            "t_placer_kpp": "222101001",
            "t_placer_ogrn": "1072221001709",
            "t_placer_short_name": "КАУ ГОСУДАРСТВЕННАЯ ЭКСПЕРТИЗА АЛТАЙСКОГО КРАЯ",
            "r_rus_name": "Алтайский край",
            "t_publication_date": "2018-12-28",
            "s_inn": "7706196090",
            "s_kpp": "",
            "s_name": "ООО Страховая компания СОГЛАСИЕ",
            "s_lotguid": "f12b1c2a-fb9d-4bff-9430-7fc29ec9dc88",
            "n_prc_diff_nmc": "1",
            "nlots_nmck": 500000
    },
    {
            "t_registration_number": "31805988205",
            "t_customer_inn": "2801010011",
            "t_customer_kpp": "280101001",
            "t_customer_ogrn": "1022800512624",
            "t_customer_short_name": "ГАУЗ АО ДЕТСКАЯ ГКБ",
            "t_placer_inn": "2801010011",
            "t_placer_kpp": "280101001",
            "t_placer_ogrn": "1022800512624",
            "t_placer_short_name": "ГАУЗ АО ДЕТСКАЯ ГКБ",
            "r_rus_name": "Амурская область",
            "t_publication_date": "2018-01-09",
            "s_inn": "2723071046",
            "s_kpp": "272301001",
            "s_name": "ООО МЕДИАС",
            "s_lotguid": "2fd7440e-e0a1-4fa0-ae7b-b901b1e378d5",
            "n_prc_diff_nmc": "2",
            "nlots_nmck": 34384
    }, 
{
            "t_registration_number": "31805988205",
            "t_customer_inn": "2801010011",
            "t_customer_kpp": "280101001",
            "t_customer_ogrn": "1022800512624",
            "t_customer_short_name": "ГАУЗ АО ДЕТСКАЯ ГКБ",
            "t_placer_inn": "2801010011",
            "t_placer_kpp": "280101001",
            "t_placer_ogrn": "1022800512624",
            "t_placer_short_name": "ГАУЗ АО ДЕТСКАЯ ГКБ",
            "r_rus_name": "Амурская область",
            "t_publication_date": "2018-01-09",
            "s_inn": "7018040688",
            "s_kpp": "701701001",
            "s_name": "ООО СКАН - М",
            "s_lotguid": "2fd7440e-e0a1-4fa0-ae7b-b901b1e378d5",
            "n_prc_diff_nmc": "2",
            "nlots_nmck": 34384
}
]


df = pd.DataFrame(x)

result = df.groupby(['s_lotguid','t_registration_number','t_customer_inn','t_customer_kpp','t_customer_ogrn','t_customer_short_name','t_placer_inn','t_placer_kpp','t_placer_ogrn','t_placer_short_name','r_rus_name','t_publication_date']).agg(','.join)#.to_dict()

最后2个JSON对象是similar,除了3个字段:“s\u inn”、“s\u kpp”、“s\u name”

我需要汇总数据,最终得到:

{
            "t_registration_number": "31807380529",
            "t_customer_inn": "2221123815",
            "t_customer_kpp": "222101001",
            "t_customer_ogrn": "1072221001709",
            "t_customer_short_name": "КАУ ГОСУДАРСТВЕННАЯ ЭКСПЕРТИЗА АЛТАЙСКОГО КРАЯ",
            "t_placer_inn": "2221123815",
            "t_placer_kpp": "222101001",
            "t_placer_ogrn": "1072221001709",
            "t_placer_short_name": "КАУ ГОСУДАРСТВЕННАЯ ЭКСПЕРТИЗА АЛТАЙСКОГО КРАЯ",
            "r_rus_name": "Алтайский край",
            "t_publication_date": "2018-12-28",
            "s_inn": "7706196090",
            "s_kpp": "",
            "s_name": "ООО Страховая компания СОГЛАСИЕ",
            "s_lotguid": "f12b1c2a-fb9d-4bff-9430-7fc29ec9dc88",
            "n_prc_diff_nmc": "1",
            "nlots_nmck": 500000
    },
{
        "t_registration_number": "31805988205",
        "t_customer_inn": "2801010011",
        "t_customer_kpp": "280101001",
        "t_customer_ogrn": "1022800512624",
        "t_customer_short_name": "ГАУЗ АО ДЕТСКАЯ ГКБ",
        "t_placer_inn": "2801010011",
        "t_placer_kpp": "280101001",
        "t_placer_ogrn": "1022800512624",
        "t_placer_short_name": "ГАУЗ АО ДЕТСКАЯ ГКБ",
        "r_rus_name": "Амурская область",
        "t_publication_date": "2018-01-09",
        "s_inn": "2723071046, 7018040688",
        "s_kpp": "272301001, 701701001",
        "s_name": "ООО МЕДИАС, ООО СКАН – М",
        "s_lotguid": "2fd7440e-e0a1-4fa0-ae7b-b901b1e378d5",
        "n_prc_diff_nmc": "2",
        "nlots_nmck": 34384
}

或在理想情况下:

{
            "t_registration_number": "31807380529",
            "t_customer_inn": "2221123815",
            "t_customer_kpp": "222101001",
            "t_customer_ogrn": "1072221001709",
            "t_customer_short_name": "КАУ ГОСУДАРСТВЕННАЯ ЭКСПЕРТИЗА АЛТАЙСКОГО КРАЯ",
            "t_placer_inn": "2221123815",
            "t_placer_kpp": "222101001",
            "t_placer_ogrn": "1072221001709",
            "t_placer_short_name": "КАУ ГОСУДАРСТВЕННАЯ ЭКСПЕРТИЗА АЛТАЙСКОГО КРАЯ",
            "r_rus_name": "Алтайский край",
            "t_publication_date": "2018-12-28",
            "s_inn": "7706196090",
            "s_kpp": "",
            "s_name": "ООО Страховая компания СОГЛАСИЕ",
            "s_lotguid": "f12b1c2a-fb9d-4bff-9430-7fc29ec9dc88",
            "n_prc_diff_nmc": "1",
            "nlots_nmck": 500000
    },

{
        "t_registration_number": "31805988205",
        "t_customer_inn": "2801010011",
        "t_customer_kpp": "280101001",
        "t_customer_ogrn": "1022800512624",
        "t_customer_short_name": "ГАУЗ АО ДЕТСКАЯ ГКБ",
        "t_placer_inn": "2801010011",
        "t_placer_kpp": "280101001",
        "t_placer_ogrn": "1022800512624",
        "t_placer_short_name": "ГАУЗ АО ДЕТСКАЯ ГКБ",
        "r_rus_name": "Амурская область",
        "t_publication_date": "2018-01-09",
        "aggregated": [
            {
                "s_inn": "2723071046",
                "s_kpp": "272301001",
                "s_name": "ООО МЕДИАС",            
            },
            {
                "s_inn": "7018040688",
                "s_kpp": "701701001",
                "s_name": "ООО СКАН – М",            
            }
        ],
        "s_lotguid": "2fd7440e-e0a1-4fa0-ae7b-b901b1e378d5",
        "n_prc_diff_nmc": "2",
        "nlots_nmck": 34384
} 

有人能帮我吗?我真的不知道怎么做。你知道吗


Tags: namenumberdatediffcustomerregistrationshortpublication