如何使用python/pysp将嵌套的avro模式转换为扁平的avro模式

2024-04-19 07:12:03 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个卡夫卡消息的嵌套avro模式。我正在尝试使用pyspark将其转换为关系数据帧。所以我想在python中展平模式以获得多个平面数据帧

下面是嵌套的avro模式示例:

{
  "name": "user",
  "type": "record"
  "fields": [
    {"name": "first_name", "type": "string" },
    {"name": "last_name", "type": "string" },
    {"name": "present_address", "type": {
        "name": "addressField"
        "type": "record",
        "fields": [
            {"name": "street_name", "type": "string"},
            {"name": "city", "type": "string"}
        ]
    }},
   {"name": "permanent_address",
    "type": {"type": "array", "items": "addressField"}} 
  ],
}

我想将其扩展到多个具有相应avro模式的数据帧,如下所示:

{
    "name": "user",
    "type": "record",
    "fields": [
        {"name": "first_name", "type": "string" },
        {"name": "last_name", "type": "string" }
    ]
}

{
    "name": "present_address",
    "type": "record",
    "fields": [
        {"name": "first_name", "type": "string" },
        {"name": "last_name", "type": "string" },
        {"name": "street_name", "type": "string"},
        {"name": "city", "type": "string"}
    ]
}

{
    "name": "permanent_address",
    "type": "record",
    "fields": [
        {"name": "first_name", "type": "string" },
        {"name": "last_name", "type": "string" },
        {"name": "street_name", "type": "string"},
        {"name": "city", "type": "string"}
    ]
}

我试着迭代初始的avro模式,并试图将其展平。但是有许多情况需要处理(可能容易出错)。是否有任何内置的python/pyspark模块可用于转换它们?你知道吗


Tags: 数据namestreetcityfieldsstringaddresstype