使用Python将JSON文件转换为自定义表

2024-04-29 08:39:43 发布

您现在位置:Python中文网/ 问答频道 /正文

大家好,我的社区

我对整个JSON文件格式有点陌生,也是Python的初学者。我有一个自定义JSON文件,其数据是嵌套和子嵌套的。我正在尝试使用Python将其转换为表格格式

我甚至不知道如何继续,我在这里提到了一些问题,他们将文件展平,然后用Python编写代码使其成形。我试过了,结果不太顺利

我将添加JSON文件和CSV格式的预期输出。请看一看它,让我知道任何想法,我可以使用它的工作

在下面的JSON文件中,对于每个公司,我们都有特定的产品ID,其中包含数据,有些ID为空。包含数据的ProductID在预测中有一个赋值,其余的将为空

我还想得到一些建议,如果我们可以通过添加更多标识符来修改JSON数据,或者删除任何标识符来帮助我们实现最终输出?(例如,为我们仅有价值的所有子类别添加产品价值和产品说明)

[
    {
        "predictions": {
            "CC__108": 0.948093,
            "CC__111": 0.897565
        },
        "CompanyID": "CI10001",
        "SHAP_ACT_CC__1": null,
        "SHAP_ACT_CC__2": null,
        "SHAP_ACT_CC__108": {
            "Product Category": "Toys",
            "Board Games": {
                "Monopoly": 35.99,
                "The Game of Life": 39.99,
                "The Clue": 20.45
            },
            "Soft Toys": {
                "Bear": {
                    "Product Value": 5.78,
                    "Product Description": "A soft bear toy"
                }
            },
            "Electronic Toys": {
                "Digital pet": 59.99,
                "Entertainment robot": 100.99
            },
            "Puzzle": {
                "Baloon Puzzle": 10.99
            },
            "Rubik's Cube": {
                "3x3 cube": {
                    "Product Value": 5.99,
                    "Product Description": "3x3 rubik's cube"
                }
            }
        },
        "SHAP_ACT_CC__109": null,
        "SHAP_ACT_CC__110": null,
        "SHAP_ACT_CC__111": {
            "Product Category": "Books",
            "Action and Adventure": {
                "Life of Pi": 14.99,
                "The Call of the Wild": 9.99
            },
            "Classics": {
                "Little Women": {
                    "Product Value": 10.99,
                    "Product Description": "Illustrated Edition"
                },
                "Beloved": {
                    "Product Value": 12.99,
                    "Product Description": "Winner of the Nobel Prize"
                }
            },
            "Comics": {
                "Watchmen": 14.99,
                "Avengers": 15.99
            },
            "Fantasy": {
                "Ninth House": 18.99
            },
            "Historical": {}
        },
        "SHAP_ACT_CC__115": null,
        "SHAP_ACT_CC__116": null
    },
    {
        "predictions": {
            "CC__124": 0.81234,
            "CC__85": 0.78943
        },
        "CompanyID": "CI10002",
        "SHAP_ACT_CC__18": null,
        "SHAP_ACT_CC__24": null,
        "SHAP_ACT_CC__124": {
            "Product Category": "Vehicles",
            "Military Aircraft": {
                "Attack Airplanes": 10000.99,
                "Bomber Airplanes": 15000.99
            },
            "Airplanes": {
                "Cargo Airplanes": {
                    "Product Value": 20000.99,
                    "Product Description": "Cargo Transport"
                }
            },
            "ATV": {},
            "Automobiles": {},
            "Bicycles": {}
        },
        "SHAP_ACT_CC__134": null,
        "SHAP_ACT_CC__135": null,
        "SHAP_ACT_CC__85": {
            "Product Category": "Boats",
            "Fishing Boats": {
                "Smudger": 25000.99,
                "Campion": 30000.99
            },
            "Dinghy Boats": {
                "Lowe": {
                    "Product Value": 10000.99,
                    "Product Description": "lowes dinghy boat"
                },
                "Pond King": {
                    "Product Value": 8000.99,
                    "Product Description": "king of the pond"
                }
            },
            "Deck Boats": {
                "Sea Ark": 45000.99
            },
            "Bowrider Boats": {},
            "House Boats": {
                "World Cat": {
                    "Product Value": 15000.99,
                    "Product Description": "3 bedroom house boat"
                }
            }
        },
        "SHAP_ACT_CC__149": null,
        "SHAP_ACT_CC__150": null
    }
]

输出数据(CSV)

CompanyID,Product ID,Product Category,Product Type,Product Name,Product Value,Product Description
CI10001,SHAP_ACT_CC__108,Toys,Board Games,Monopoly,35.99,
CI10001,SHAP_ACT_CC__108,Toys,Board Games,The Game of Life,39.99,
CI10001,SHAP_ACT_CC__108,Toys,Board Games,The Clue,20.45,
CI10001,SHAP_ACT_CC__108,Toys,Soft Toys,Bear,5.78,
CI10001,SHAP_ACT_CC__108,Toys,Electronic Toys,Digital pet,59.99,
CI10001,SHAP_ACT_CC__108,Toys,Electronic Toys,Entertainment robot,100.99,
CI10001,SHAP_ACT_CC__108,Toys,Puzzle,Baloon Puzzle,10.99,
CI10001,SHAP_ACT_CC__108,Toys,Rubik's Cube,3x3 cube,5.99,3x3 rubik's cube
CI10001,SHAP_ACT_CC__111,Books,Action and Adventure,Life of Pi,14.99,
CI10001,SHAP_ACT_CC__111,Books,Action and Adventure,The Call of the Wild,9.99,
CI10001,SHAP_ACT_CC__111,Books,Classics,Little Women,10.99,Illustrated Edition
CI10001,SHAP_ACT_CC__111,Books,Classics,Beloved,12.99,Winner of the Nobel Prize
CI10001,SHAP_ACT_CC__111,Books,Comics,Watchmen,14.99,
CI10001,SHAP_ACT_CC__111,Books,Comics,Avengers,15.99,
CI10001,SHAP_ACT_CC__111,Books,Fantasy,Ninth House,18.99,
CI10001,SHAP_ACT_CC__111,Books,Historical,,,
CI10002,SHAP_ACT_CC__124,Vehicles,Military Aircraft,Attack Airplanes,10000.99,
CI10002,SHAP_ACT_CC__124,Vehicles,Military Aircraft,Bomber Airplanes,15000.99,
CI10002,SHAP_ACT_CC__124,Vehicles,Airplanes,Cargo Airplanes,20000.99,Cargo Transport
CI10002,SHAP_ACT_CC__124,Vehicles,ATV,,,
CI10002,SHAP_ACT_CC__124,Vehicles,Automobiles,,,
CI10002,SHAP_ACT_CC__124,Vehicles,Bicycles,,,
CI10002,SHAP_ACT_CC__85,Boats,Fishing Boats,Smudger,25000.99,
CI10002,SHAP_ACT_CC__85,Boats,Fishing Boats,Campion,30000.99,
CI10002,SHAP_ACT_CC__85,Boats,Dinghy Boats,Lowe,10000.99,lowes dinghy boat
CI10002,SHAP_ACT_CC__85,Boats,Dinghy Boats,Pond King,8000.99,king of the pond
CI10002,SHAP_ACT_CC__85,Boats,Deck Boats,Sea Ark,45000.99,
CI10002,SHAP_ACT_CC__85,Boats,Bowrider Boats,,,
CI10002,SHAP_ACT_CC__85,Boats,House Boats,World Cat,15000.99,3 bedroom house boat

Output CSV File

谢谢你的帮助。任何想法或附加链接都会很有帮助,如果我需要添加任何附加信息或更改帖子格式,请告诉我


Tags: ofvaluedescriptionproductbooksactnullcc
1条回答
网友
1楼 · 发布于 2024-04-29 08:39:43

没有直接的方法来处理这种特殊的JSON模式。您需要从最里面的项目一直使用模式,处理每种类型的对象

产品信息

最里面的字段包含产品详细信息,可以是表示值的单个数字

14.99

或具有值和描述键的对象

{
  "Product Value": 10.99,
  "Product Description": "Illustrated Edition"
}

您可以这样处理它:

def process_product(ptype):
    if isinstance(ptype, dict):
        value = ptype['Product Value']
        description = ptype['Product Description']
    else:
        value = ptype
        description = None

    return value, description

产品类别

然后是产品类别,每个类别都有一个特殊的Product Category键和多个产品类型键

{
  "Product Category": "Books",
  "Action and Adventure": {
    "Life of Pi": 14.99,
    "The Call of the Wild": 9.99
  },
  "Classics": {...}
}

您可以使用以下功能对其进行处理:

def process_category(category):
    for key in category:
        if key == 'Product Category':
            continue
        else:
            for product_type, product_data in category[key].items():
                yield (
                    category['Product Category'],
                    key,
                    product_type,
                    *process_product(product_data),
                )

公司

层次结构中的最后一个对象是公司,每个对象都有一个CompanyID键和几个产品类别

{
  "CompanyID": "CI10001",
  "SHAP_ACT_CC__1": null,
  "SHAP_ACT_CC__108": {
    "Product Category": "Toys",
    "Board Games": {...}
  }
}

该函数的工作原理与前面的函数类似:

def process_company(company):
    for key, data in company.items():
        if key.startswith('SHAP_ACT') and data is not None:
            for category_data in process_category(data):
                yield company['CompanyID'], key, *category_data

多公司

现在编写一个函数来处理数组中的所有记录:

def process_data(companies):
    for company in companies:
        for company_data in process_company(company):
            yield {
                'Company ID': company_data[0],
                'Product ID': company_data[1],
                'Product Category': company_data[2],
                'Product Type': company_data[3],
                'Product Name': company_data[4],
                'Product Value': company_data[5],
                'Product Description': company_data[6],
            }

数据帧

pd.DataFrame(list(process_data(data)))

输出

   Company ID        Product ID Product Category          Product Type          Product Name  Product Value        Product Description
0     CI10001  SHAP_ACT_CC__108             Toys           Board Games              Monopoly          35.99                       None
1     CI10001  SHAP_ACT_CC__108             Toys           Board Games      The Game of Life          39.99                       None
2     CI10001  SHAP_ACT_CC__108             Toys           Board Games              The Clue          20.45                       None
3     CI10001  SHAP_ACT_CC__108             Toys             Soft Toys                  Bear           5.78            A soft bear toy
4     CI10001  SHAP_ACT_CC__108             Toys       Electronic Toys           Digital pet          59.99                       None
5     CI10001  SHAP_ACT_CC__108             Toys       Electronic Toys   Entertainment robot         100.99                       None
6     CI10001  SHAP_ACT_CC__108             Toys                Puzzle         Baloon Puzzle          10.99                       None
7     CI10001  SHAP_ACT_CC__108             Toys          Rubik's Cube              3x3 cube           5.99           3x3 rubik's cube
8     CI10001  SHAP_ACT_CC__111            Books  Action and Adventure            Life of Pi          14.99                       None
9     CI10001  SHAP_ACT_CC__111            Books  Action and Adventure  The Call of the Wild           9.99                       None
10    CI10001  SHAP_ACT_CC__111            Books              Classics          Little Women          10.99        Illustrated Edition
11    CI10001  SHAP_ACT_CC__111            Books              Classics               Beloved          12.99  Winner of the Nobel Prize
12    CI10001  SHAP_ACT_CC__111            Books                Comics              Watchmen          14.99                       None
13    CI10001  SHAP_ACT_CC__111            Books                Comics              Avengers          15.99                       None
14    CI10001  SHAP_ACT_CC__111            Books               Fantasy           Ninth House          18.99                       None
15    CI10002  SHAP_ACT_CC__124         Vehicles     Military Aircraft      Attack Airplanes       10000.99                       None
16    CI10002  SHAP_ACT_CC__124         Vehicles     Military Aircraft      Bomber Airplanes       15000.99                       None
17    CI10002  SHAP_ACT_CC__124         Vehicles             Airplanes       Cargo Airplanes       20000.99            Cargo Transport
18    CI10002   SHAP_ACT_CC__85            Boats         Fishing Boats               Smudger       25000.99                       None
19    CI10002   SHAP_ACT_CC__85            Boats         Fishing Boats               Campion       30000.99                       None
20    CI10002   SHAP_ACT_CC__85            Boats          Dinghy Boats                  Lowe       10000.99          lowes dinghy boat
21    CI10002   SHAP_ACT_CC__85            Boats          Dinghy Boats             Pond King        8000.99           king of the pond
22    CI10002   SHAP_ACT_CC__85            Boats            Deck Boats               Sea Ark       45000.99                       None
23    CI10002   SHAP_ACT_CC__85            Boats           House Boats             World Cat       15000.99       3 bedroom house boat

相关问题 更多 >