将json中的键和值提取到新的datafram中

2024-04-27 00:51:02 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个dataframe,它的JSON值在列中。它们被缩进了多个层次。我想将结束键和值提取到一个新的数据帧中。我将在下面给出示例列值

{'shipping_assignments': [{'shipping': {'address': {'address_type': 'shipping', 'city': 'Calder', 'country_id': 'US', 'customer_address_id': 1, 'email': 'roni_cost@example.com', 'entity_id': 1, 'firstname': 'Veronica', 'lastname': 'Costello', 'parent_id': 1, 'postcode': '49628-7978', 'region': 'Michigan', 'region_code': 'MI', 'region_id': 33, 'street': ['6146 Honey Bluff Parkway'], 'telephone': '(555) 229-3326'}, 'method': 'flatrate_flatrate', 'total': {'base_shipping_amount': 5, 'base_shipping_discount_amount': 0, 'base_shipping_discount_tax_compensation_amnt': 0, 'base_shipping_incl_tax': 5, 'base_shipping_invoiced': 5, 'base_shipping_tax_amount': 0, 'shipping_amount': 5, 'shipping_discount_amount': 0, 'shipping_discount_tax_compensation_amount': 0, 'shipping_incl_tax': 5, 'shipping_invoiced': 5, 'shipping_tax_amount': 0}}, 'items': [{'amount_refunded': 0, 'applied_rule_ids': '1', 'base_amount_refunded': 0, 'base_discount_amount': 0, 'base_discount_invoiced': 0, 'base_discount_tax_compensation_amount': 0, 'base_discount_tax_compensation_invoiced': 0, 'base_original_price': 29, 'base_price': 29, 'base_price_incl_tax': 31.39, 'base_row_invoiced': 29, 'base_row_total': 29, 'base_row_total_incl_tax': 31.39, 'base_tax_amount': 2.39, 'base_tax_invoiced': 2.39, 'created_at': '2019-09-27 10:03:45', 'discount_amount': 0, 'discount_invoiced': 0, 'discount_percent': 0, 'free_shipping': 0, 'discount_tax_compensation_amount': 0, 'discount_tax_compensation_invoiced': 0, 'is_qty_decimal': 0, 'item_id': 1, 'name': 'Iris Workout Top', 'no_discount': 0, 'order_id': 1, 'original_price': 29, 'price': 29, 'price_incl_tax': 31.39, 'product_id': 1434, 'product_type': 'configurable', 'qty_canceled': 0, 'qty_invoiced': 1, 'qty_ordered': 1, 'qty_refunded': 0, 'qty_shipped': 1, 'row_invoiced': 29, 'row_total': 29, 'row_total_incl_tax': 31.39, 'row_weight': 1, 'sku': 'WS03-XS-Red', 'store_id': 1, 'tax_amount': 2.39, 'tax_invoiced': 2.39, 'tax_percent': 8.25, 'updated_at': '2019-09-27 10:03:46', 'weight': 1, 'product_option': {'extension_attributes': {'configurable_item_options': [{'option_id': '141', 'option_value': 167}, {'option_id': '93', 'option_value': 58}]}}}]}], 'payment_additional_info': [{'key': 'method_title', 'value': 'Check / Money order'}], 'applied_taxes': [{'code': 'US-MI--Rate 1', 'title': 'US-MI--Rate 1', 'percent': 8.25, 'amount': 2.39, 'base_amount': 2.39}], 'item_applied_taxes': [{'type': 'product', 'applied_taxes': [{'code': 'US-MI--Rate 1', 'title': 'US-MI--Rate 1', 'percent': 8.25, 'amount': 2.39, 'base_amount': 2.39}]}], 'converting_from_quote': True}

上面是数据帧列df['x']的单行值

我的代码在下面转换

sample = data['x'].tolist()
data = json.dumps(sample)
df = pd.read_json(data)

它提供了新的带有列的数据帧

Index(['applied_taxes', 'converting_from_quote', 'item_applied_taxes', 'payment_additional_info', 'shipping_assignments'], dtype='object')

当我尝试执行上述操作来转换具有行值的列时

m_df = df['applied_taxes'].apply(lambda x : re.sub('.?\[|$.|]',"", str(x)))
m_sample = m_df.tolist()
m_data = json.dumps(m_sample)
c_df = pd.read_json(m_data)

它不起作用

检查此链接以获取beautified_json


Tags: iddfbasediscountamountpricerowinvoiced
2条回答

看来你的错误出在tolist()。请尝试以下操作:

import pandas as pd
import json
import re

data = {"shipping_assignments":[{"shipping":{"address":{"address_type":"shipping","city":"Calder","country_id":"US","customer_address_id":1,"email":"roni_cost@example.com","entity_id":1,"firstname":"Veronica","lastname":"Costello","parent_id":1,"postcode":"49628-7978","region":"Michigan","region_code":"MI","region_id":33,"street":["6146 Honey Bluff Parkway"],"telephone":"(555) 229-3326"},"method":"flatrate_flatrate","total":{"base_shipping_amount":5,"base_shipping_discount_amount":0,"base_shipping_discount_tax_compensation_amnt":0,"base_shipping_incl_tax":5,"base_shipping_invoiced":5,"base_shipping_tax_amount":0,"shipping_amount":5,"shipping_discount_amount":0,"shipping_discount_tax_compensation_amount":0,"shipping_incl_tax":5,"shipping_invoiced":5,"shipping_tax_amount":0}},"items":[{"amount_refunded":0,"applied_rule_ids":"1","base_amount_refunded":0,"base_discount_amount":0,"base_discount_invoiced":0,"base_discount_tax_compensation_amount":0,"base_discount_tax_compensation_invoiced":0,"base_original_price":29,"base_price":29,"base_price_incl_tax":31.39,"base_row_invoiced":29,"base_row_total":29,"base_row_total_incl_tax":31.39,"base_tax_amount":2.39,"base_tax_invoiced":2.39,"created_at":"2019-09-27 10:03:45","discount_amount":0,"discount_invoiced":0,"discount_percent":0,"free_shipping":0,"discount_tax_compensation_amount":0,"discount_tax_compensation_invoiced":0,"is_qty_decimal":0,"item_id":1,"name":"Iris Workout Top","no_discount":0,"order_id":1,"original_price":29,"price":29,"price_incl_tax":31.39,"product_id":1434,"product_type":"configurable","qty_canceled":0,"qty_invoiced":1,"qty_ordered":1,"qty_refunded":0,"qty_shipped":1,"row_invoiced":29,"row_total":29,"row_total_incl_tax":31.39,"row_weight":1,"sku":"WS03-XS-Red","store_id":1,"tax_amount":2.39,"tax_invoiced":2.39,"tax_percent":8.25,"updated_at":"2019-09-27 10:03:46","weight":1,"product_option":{"extension_attributes":{"configurable_item_options":[{"option_id":"141","option_value":167},{"option_id":"93","option_value":58}]}}}]}],"payment_additional_info":[{"key":"method_title","value":"Check / Money order"}],"applied_taxes":[{"code":"US-MI-*-Rate 1","title":"US-MI-*-Rate 1","percent":8.25,"amount":2.39,"base_amount":2.39}],"item_applied_taxes":[{"type":"product","applied_taxes":[{"code":"US-MI-*-Rate 1","title":"US-MI-*-Rate 1","percent":8.25,"amount":2.39,"base_amount":2.39}]}],"converting_from_quote":"True"}

df = pd.read_json(json.dumps(data))
m_df = df['applied_taxes'].apply(lambda x : re.sub('.?\[|$.|]',"", str(x)))
c_df = pd.read_json(json.dumps(list(m_df)))
print(c_df)

打印以下内容:

                                                   0
0  {'code': 'US-MI-*-Rate 1', 'title': 'US-MI-*-R...

我在python中遇到了一个漂亮的ETL包,名为petl。使用名为fromdicts(json\u string)的函数将json列表转换为dict格式

order_table = fromdicts(data_list)

如果在任何列中发现任何嵌套dict,请使用unpackdict(order\u table,'nested\u col') 它将解包嵌套的dict。 在我的情况下,我需要打开申请的税列。下面的代码将解包并将键和值作为列和行追加到同一个表中。你知道吗

order_table  = unpackdict(order_table, 'applied_taxes')

如果你们想知道更多关于petl

相关问题 更多 >