从JSON嵌套数组中获取数据并将其放入excel

3条回答

网友

1楼 · 编辑于 2024-06-16 09:34:32

您可能需要json来读取数据，csv或其他模块来写入数据，但其余的则不需要任何额外的模块，只需要for循环和dict/list函数

最后你有完整的工作示例

我使用json将JSON字符串转换为Python数据

import json

data = json.loads(text)

然后我可以使用for-loop来处理数据，并将其转换为行列表，每一行都将是字典：

{
   'Id': ..., 
   'Date': ..., 
   'Budget': ..., 
   'Category': ..., 
   'Country': ..., 
   'Posted On': ..., 
   'Hourly Range': ...
}

我使用for循环分别处理每个messages，并为列表中可能不存在的元素创建具有默认值的行text

all_rows = []

for msg in data['messages']:
    row = {
        'Id': msg['id'],
        'Date': msg['date'],
        'Budget': None,       # default value if it not in msg
        'Category': None,     # default value if it not in msg
        'Country': None,      # default value if it not in msg
        'Posted On': None,    # default value if it not in msg
        'Hourly Range': None, # default value if it not in msg
    }

我使用iter()将列表text转换为迭代器，然后我可以将它与zip(it, it)一起使用来创建像第一个元素{"type": "bold", "text": "Budget"}和第二个元素": $500\\n"这样的对。然后我可以创建pair"Budget"和$500以及kee inrow

text = msg['text']

it = iter(text) # to create pairs with `zip` 
next(it)        # skip first element: "Random job description.\\n\\n"

for x, y in zip(it, it):  # work with pairs
    key = x['text']                      # ie. "Budget"
    value = y.strip().replace(': ', '')  # ie. "$500"
    row[key] = value

all_rows.append(row)  # keep this row on list

之后，我有一个包含所有行的列表：

[
 {'Id': 20482, 'Date': '2020-12-04T16:34:40', 'Budget': '$500', 'Category': 'UX/UI Design', 'Country': None, 'Posted On': 'December 04, 2020 13:28 UTC', 'Hourly Range': None}, 
 {'Id': 21144, 'Date': '2020-12-06T01:04:50', 'Budget': None, 'Category': None, 'Country': 'Serbia', 'Posted On': 'December 05, 2020 21:31 UTC', 'Hourly Range': '$13.00-$35.00'}
]

现在我可以使用csv将其写入文件CSV，我可以在Excel、LibreOffice或任何其他类似程序（或Python模块pandas）中读取该文件

import csv

headers = ['Id', 'Date', 'Budget', 'Category', 'Country', 'Posted On', 'Hourly Range']

with open('output.csv', 'w') as fh:
    csv_writer = csv.DictWriter(fh, headers)

    csv_writer.writeheader()
    csv_writer.writerows(all_rows)

最小工作示例

text = '''
{
 "name": "Messenger_group",
 "id": 85648902334,
 "messages": [
  {
   "id": 20482,
   "type": "message",
   "date": "2020-12-04T16:34:40",
   "from": "IFTTT",
   "from_id": 4535011322,
   "text": [
    "Random job description.\\n\\n",
    {
     "type": "bold",
     "text": "Budget"
    },
    ": $500\\n",
    {
     "type": "bold",
     "text": "Posted On"
    },
    ": December 04, 2020 13:28 UTC\\n",
    {
     "type": "bold",
     "text": "Category"
    },
    ": UX/UI Design\\n",
    {
     "type": "link",
     "text": "https://url.com/"
    }
   ]
  },
  {
   "id": 21144,
   "type": "message",
   "date": "2020-12-06T01:04:50",
   "from": "IFTTT",
   "from_id": 4535011322,
   "text": [
    "Random job description.\\n\\n",
    {
     "type": "bold",
     "text": "Hourly Range"
    },
    ": $13.00-$35.00\\n",
    {
     "type": "bold",
     "text": "Posted On"
    },
    ": December 05, 2020 21:31 UTC\\n",
    {
     "type": "bold",
     "text": "Country"
    },
    ": Serbia\\n",
    {
     "type": "link",
     "text": "https://url.com"
    }
   ]
  }
 ]
}
'''

#  - read  -

import json

data = json.loads(text)

#  - process  -

all_rows = []

headers = ['Id', 'Date', 'Budget', 'Category', 'Country', 'Posted On', 'Hourly Range']

for msg in data['messages']:
    row = {
        'Id': msg['id'],
        'Date': msg['date'],
        'Budget': None,       # default value if it not in msg
        'Category': None,     # default value if it not in msg
        'Country': None,      # default value if it not in msg
        'Posted On': None,    # default value if it not in msg
        'Hourly Range': None, # default value if it not in msg
    }

    text = msg['text']

    it = iter(text)  # to create pairs with `zip` 
    next(it)         # skip first element

    for x, y in zip(it, it):
        key = x['text']
        value = y.strip().replace(': ', '')
        row[key] = value

    all_rows.append(row)

    for key,value in row.items():
        print(key, ':', value)
    print(' -')

#  - write  -

import csv

with open('output.csv', 'w') as fh:
    csv_writer = csv.DictWriter(fh, headers)

    csv_writer.writeheader()
    csv_writer.writerows(all_rows)

网友

2楼 · 编辑于 2024-06-16 09:34:32

下面是我在更新了一条非常有用的@furas评论后得出的结论

text = '''
{
 "name": "Messenger_group",
 "id": 85648902334,
 "messages": [
  {
   "id": 20482,
   "type": "message",
   "date": "2020-12-04T16:34:40",
   "from": "IFTTT",
   "from_id": 4535011322,
   "text": [
    "Random job description.\\n\\n",
    {
     "type": "bold",
     "text": "Budget"
    },
    ": $500\\n",
    {
     "type": "bold",
     "text": "Posted On"
    },
    ": December 04, 2020 13:28 UTC\\n",
    {
     "type": "bold",
     "text": "Category"
    },
    ": UX/UI Design\\n",
    {
     "type": "link",
     "text": "https://url.com/"
    }
   ]
  },
  {
   "id": 21144,
   "type": "message",
   "date": "2020-12-06T01:04:50",
   "from": "IFTTT",
   "from_id": 4535011322,
   "text": [
    "Random job description.\\n\\n",
    {
     "type": "bold",
     "text": "Hourly Range"
    },
    ": $13.00-$35.00\\n",
    {
     "type": "bold",
     "text": "Posted On"
    },
    ": December 05, 2020 21:31 UTC\\n",
    {
     "type": "bold",
     "text": "Country"
    },
    ": Serbia\\n",
    {
     "type": "link",
     "text": "https://url.com"
    }
   ]
  }
 ]
}
'''

import json
data = json.loads(text)

all_rows = []

include_params = [
    'Id',
    'Date',
    'Budget',
    'Category',
    'Country',
    'Posted On',
    'Hourly Range',
    'link',
]

for msg in data['messages']:
    row = {}

    if 'Id' in include_params:
        row['Id'] = msg['id']
    if 'Date' in include_params:
        row['Date'] = msg['date']

    text = msg['text']
    it = iter(text)
    for i in it:
        if type(i) == dict:
            if i['text'] in include_params:
                key = i['text']
                value = next(it).strip().replace(': ', '')
                row[key] = value
            if 'https://url' in i['text'] and 'link' in include_params:
                key = i['type']
                value = i['text']
                row[key] = value

    all_rows.append(row)

    for key,value in row.items():
        print(key, ':', value)
    print(' -')


import csv

headers = include_params

with open('output_test.csv', 'w') as fh:
    csv_writer = csv.DictWriter(fh, headers)

    csv_writer.writeheader()
    csv_writer.writerows(all_rows)

网友

3楼 · 编辑于 2024-06-16 09:34:32

我在一次实习期间也遇到了类似的问题。JSON是嵌套的，我想把它转换成一个表，但它不是直接的。而且我的JSON文件很大

我使用了ijsonlibrary。它允许您以迭代方式解析json文件，并返回前缀、事件、值元组

例如，这是我解析JSON时的输出：

理解json结构后，您可以将前缀和事件值关联起来，并设计一种算法将某些特定值写入CSV行

不确定这是否是最理想的方法，但它确实对我有效

相关问题更多 >

编程相关推荐

热门问题

热门文章