如何将数据从dict列表中提取到数据帧中？

[{"_": "Message", "id": 4589, "to_id": {"_": "PeerChannel", "channel_id": 1399858792}, "date": "2020-09-03T14:51:03+00:00", "message": "Looking for product managers / engineers who have worked in search engine / query understanding space. Please PM me if you can connect me to someone for the same", "out": false, "mentioned": false, "media_unread": false, "silent": false, "post": false, "from_scheduled": false, "legacy": false, "edit_hide": false, "from_id": 356886523, "fwd_from": null, "via_bot_id": null, "reply_to_msg_id": null, "media": null, "reply_markup": null, "entities": [], "views": null, "edit_date": null, "post_author": null, "grouped_id": null, "restriction_reason": []}, {"_": "MessageService", "id": 4588, "to_id": {"_": "PeerChannel", "channel_id": 1399858792}, "date": "2020-09-03T11:48:18+00:00", "action": {"_": "MessageActionChatJoinedByLink", "inviter_id": 310378430}, "out": false, "mentioned": false, "media_unread": false, "silent": false, "post": false, "legacy": false, "from_id": 1264437394, "reply_to_msg_id": null}

2条回答

网友

1楼 · 编辑于 2024-05-14 18:39:17

这假定从API返回的对象不是字符串（例如'[{...}, {...}]'）。
- 如果它是一个字符串，首先使用data = json.loads(data)
可以使用列表理解从{}的{}中提取{}和相应的{}
遍历list中的每个dict，并对key使用dict.get。如果键不存在，则返回None

import pandas as pd

# where data is the list of dicts, unpack the desired keys and load into pandas
df = pd.DataFrame([{'date': i.get('date'), 'message': i.get('message')} for i in data])

# display(df)
                        date                                                                                                                                                            message
0  2020-09-03T14:51:03+00:00  Looking for product managers / engineers who have worked in search engine / query understanding space. Please PM me if you can connect me to someone for the same
1  2020-09-03T11:48:18+00:00                                                                                                                                                               None

或者

如果希望跳过数据，其中'message'是None

df = pd.DataFrame([{'date': i['date'], 'message': i['message']} for i in data if i.get('message')])

                      date                                                                                                                                                            message
 2020-09-03T14:51:03+00:00  Looking for product managers / engineers who have worked in search engine / query understanding space. Please PM me if you can connect me to someone for the same

网友

2楼 · 编辑于 2024-05-14 18:39:17

我认为您应该使用json加载，然后使用json_规范化将json转换为嵌套字典的最高级别的数据帧

from pandas import json_normalize
import json
d = '[{"_": "Message", "id": 4589, "to_id": {"_": "PeerChannel", "channel_id": 1399858792}, "date": "2020-09-03T14:51:03+00:00", "message": "Looking for product managers / engineers who have worked in search engine / query understanding space. Please PM me if you can connect me to someone for the same", "out": false, "mentioned": false, "media_unread": false, "silent": false, "post": false, "from_scheduled": false, "legacy": false, "edit_hide": false, "from_id": 356886523, "fwd_from": null, "via_bot_id": null, "reply_to_msg_id": null, "media": null, "reply_markup": null, "entities": [], "views": null, "edit_date": null, "post_author": null, "grouped_id": null, "restriction_reason": []}, {"_": "MessageService", "id": 4588, "to_id": {"_": "PeerChannel", "channel_id": 1399858792}, "date": "2020-09-03T11:48:18+00:00", "action": {"_": "MessageActionChatJoinedByLink", "inviter_id": 310378430}, "out": false, "mentioned": false, "media_unread": false, "silent": false, "post": false, "legacy": false, "from_id": 1264437394, "reply_to_msg_id": null}]'
f = json.loads(d)
print(json_normalize(f, max_level=2))

或者

相关问题更多 >

编程相关推荐

热门问题

热门文章