函数从2JSONAPI创建一个json并应用于每个轴

2024-04-27 03:44:46 发布

您现在位置:Python中文网/ 问答频道 /正文

df的第一列是uid,第二列是url\ujson1,第三列是url\ujson2,预期out在第四列。url\ujson1和url\ujson2内部的key类似。那么,我们对列上的dataframe应用什么函数,它将应用于所有列

我有3个数据从uid开始从10011003

uid
1001
1002
1003

第二列url_json1

第一行中的数据url_json1

    {
        "quiz": {
            "sport": {
                "result": {
                    "question": "Which one is correct team name in NBA?",
                    "answer": "Huston Rocket"
                }
            }
        }
    }

第三列第一行url_json2

{
    "quiz": {
        "sport": {
            "result": {
                "question": "Which one is correct team name in football?",
                "answer": "Chelsea"
            },
            "type": "document"
        }
    }
}

输出json

{ "1001": { "url_1": { "question": "Which one is correct team name in NBA?", "answer": "Huston Rocket" }, "url_2": { "question": "Which one is correct team name in football?", "answer": "Chelsea" } } }


Tags: 数据answernameinurlwhichuidis
1条回答
网友
1楼 · 发布于 2024-04-27 03:44:46

用途:

from collections import defaultdict
import json

a = {
        "quiz": {
            "sport": {
                "result": {
                    "question": "Which one is correct team name in NBA?",
                    "answer": "Huston Rocket"
                }
            }
        }
    }
b = {
    "quiz": {
        "sport": {
            "result": {
                "question": "Which one is correct team name in football?",
                "answer": "Chelsea"
            },
            "type": "document"
        }
    }
}

df = pd.DataFrame({'uid':['1001','1002'],
                   'url_json1':[json.dumps(a), json.dumps(a)],
                   'url_json2':[json.dumps(b), json.dumps(b)]})
#print (df)

N个json列的解决方案(包含uid列的DataFrame和所有其他列都由json填充):

def func(x):
    #extract uid to variable
    uid = x.pop('uid')
    #storage for nested dicts
    d = defaultdict(dict)
    #loop by all json columns and update defaultdict
    for i, y in enumerate(x, 1):
        #convert json to dict
        y = json.loads(y)
        d[uid].update({f'url_{i}': y['quiz']['sport']['result']})
    #convert back to json
    return json.dumps(dict(d))

df['new'] = df.apply(func, axis=1)
print (df)
    uid                                          url_json1  \
0  1001  {"quiz": {"sport": {"result": {"question": "Wh...   
1  1002  {"quiz": {"sport": {"result": {"question": "Wh...   

                                           url_json2  \
0  {"quiz": {"sport": {"result": {"question": "Wh...   
1  {"quiz": {"sport": {"result": {"question": "Wh...   

                                                 new  
0  {"1001": {"url_1": {"question": "Which one is ...  
1  {"1002": {"url_1": {"question": "Which one is ...  

如果只有2个json列,则解决方案是简化:

def func(x):
    d1 = json.loads(x['url_json1'])['quiz']['sport']['result']
    d2 = json.loads(x['url_json2'])['quiz']['sport']['result']
    uid = x['uid']

    d = {uid:{'url_1':d1, 'url_2':d2}}
    return json.dumps(d)

df['new'] = df.apply(func, axis=1)

相关问题 更多 >