将JSON对象转换为数据框
我对如何把JSON对象转换成数据框(DataFrame)不太熟悉。希望能找到一些更有效的方法来进行这种转换。
当我遍历我的input_df
的两行数据时,打印suggestion_string
会得到两个JSON对象,具体如下:
我期望的输出结果是一个数据框,内容如下:
建议1 | 原因1 | 建议2 | 原因2 | 建议3 | 原因3 |
---|---|---|---|---|---|
自由文本 | 自由文本 | 自由文本 | 自由文本 | 自由文本 | 自由文本 |
自由文本 | 自由文本 | 自由文本 | 自由文本 | 自由文本 | 自由文本 |
我尝试过以下代码,确实有效;但我希望能找到一些更高效的转换方法。
suggestion_list = []
for i in range(len(input_df)):
description = input_df.loc[i, 'description']
query = fr"""
sample text ... {description}?
"""
suggestion_string = return_json_object(query)
string = suggestion_string.replace("`",'')
string = string.replace('json', '')
str_list = string.split('\n')
dict_str = ''.join(str_list)
output_dict = json.loads(dict_str)
suggestion_list.append(output_dict)
lists_of_dicts = []
for dict in suggestion_list:
list_of_dicts.append(dict['suggestions'])
flat_data = []
for sublist in list_of_dicts:
row_data = {}
for i, item in enumerate(sublist):
row_data[f'suggestion{i+1}'] = item['suggestion']
row_data[f'reason{i+1}'] = item['reason']
flat_data.append(row_data)
suggestion_df = pd.DataFrame(flat_data)
2 个回答
0
这里有一个函数和一个示例的json文件,可以用来测试这个函数。这个函数适用于任何类型的json,无论里面有没有嵌套的字段。
data = [
{
"id": 1,
"name": "John Doe",
"email": "john@example.com",
"location": {
"city": "New York",
"country": "USA"
},
"skills": ["Python", "Data Analysis"]
},
{
"id": 2,
"name": "Jane Doe",
"email": "jane@example.com",
"location": {
"city": "San Francisco",
"country": "USA"
},
"skills": ["JavaScript", "React"]
}
]
df = pd.DataFrame(data)
def flatten_nested_json_df(df):
df = df.reset_index()
s = (df.applymap(type) == list).all()
list_columns = s[s].index.tolist()
s = (df.applymap(type) == dict).all()
dict_columns = s[s].index.tolist()
while len(list_columns) > 0 or len(dict_columns) > 0:
new_columns = []
for col in dict_columns:
horiz_exploded = pd.json_normalize(df[col]).add_prefix(f'{col}.')
horiz_exploded.index = df.index
df = pd.concat([df, horiz_exploded], axis=1).drop(columns=[col])
new_columns.extend(horiz_exploded.columns) # inplace
for col in list_columns:
df = df.drop(columns=[col]).join(df[col].explode().to_frame())
new_columns.append(col)
s = (df[new_columns].applymap(type) == list).all()
list_columns = s[s].index.tolist()
s = (df[new_columns].applymap(type) == dict).all()
dict_columns = s[s].index.tolist()
return df
如何使用它
flatten_nested_json_df(df)
返回结果
index id name email location.city location.country \
0 0 1 John Doe john@example.com New York USA
0 0 1 John Doe john@example.com New York USA
1 1 2 Jane Doe jane@example.com San Francisco USA
1 1 2 Jane Doe jane@example.com San Francisco USA
skills
0 Python
0 Data Analysis
1 JavaScript
1 React
1
我创建了一些示例建议字符串——不过你可以通过修改下面的代码,结合你的 input_df
逻辑,继续你的循环。我为了简单起见,直接写死了这些建议字符串。
代码:
import pandas as pd
import json
suggestion_string_1 = '''
```
json
{
"suggestions": [
{"suggestion": "free text", "reason": "free text"},
{"suggestion": "free text", "reason": "free text"},
{"suggestion": "free text", "reason": "free text"}
]
}
```
'''
suggestion_string_2 = '''
```
json
{
"suggestions": [
{"suggestion": "free text", "reason": "free text"},
{"suggestion": "free text", "reason": "free text"},
{"suggestion": "free text", "reason": "free text"}
]
}
```
'''
suggestion_list = []
input_df = [suggestion_string_1, suggestion_string_2]
for i in range(len(input_df)):
# description = input_df.loc[i, 'description']
#
# query = fr"""
# sample text ... {description}?
# """
#
# suggestion_string = return_json_object(query)
# I have hard-coded the suggestion_string for brevity -
# Assuming you are getting the same strings that I have
# hard-coded as per the screenshot shared by you.
suggestion_string = input_df[i] # So, you can just replace this line with your logic for getting the suggestion string.
suggestions = json.loads(
suggestion_string[suggestion_string.find('{'):suggestion_string.rfind('}') + 1]
)['suggestions']
suggestion_list.append(suggestions)
suggestion_df = pd.DataFrame()
for idx, suggestions in enumerate(suggestion_list):
for i, suggestion in enumerate(suggestions):
suggestion_df.loc[idx, f'suggestion{i+1}'] = suggestion['suggestion']
suggestion_df.loc[idx, f'reason{i+1}'] = suggestion['reason']
print(suggestion_df)
输出:
建议1 | 原因1 | 建议2 | 原因2 | 建议3 | 原因3 |
---|---|---|---|---|---|
自由文本 | 自由文本 | 自由文本 | 自由文本 | 自由文本 | 自由文本 |
自由文本 | 自由文本 | 自由文本 | 自由文本 | 自由文本 | 自由文本 |