将具有元组列表的字典转换为datafram

2024-06-16 10:52:54 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一本这样的字典:

pred_dict = {('african zebra', 'arabian horse'): [('Blue Whale', 0.49859235), ('Ferrari', 0.5013809), ('african zebra', 0.49264234), ('ara
    ...: bian horse', 0.5186422), ('bobcat', 0.5096679)], ('cheetah', 'mountain lion'): [('Blue Whale', 0.48881102), ('Ferrari', 0.502793), ('afric
    ...: an zebra', 0.48751196), ('arabian horse', 0.49272105), ('bobcat', 0.5228181)]}

要转换为如下数据帧:

Text | Blue Whale | Ferrari | african zebra| arabian horse | bobcat | 
('african zebra', 'arabian horse') 0.49859235 0.5013809 0.49264234 0.5186422 0.5096679
('cheetah', 'mountain lion') 0.48881102 0.502793 0.48751196 0.49272105 0.5228181

给定字典中的每个值都具有完全相同的元组数,元组列表中的第一个值相同。要做的是将dict的键放在“text”列中,然后将元组中的第一个值作为其他列名。值将是分数-浮动。你知道吗

任何建议都会有帮助。以下是我正在尝试的一些东西:

In [12]: text = list(pred_dict.keys())

In [13]: values = list(pred_dict.values())

In [14]: pred_df = pd.DataFrame({'text': text, 'label_scores': values})

In [15]: pred_df
Out[15]:
                             text                                       label_scores
0  (african zebra, arabian horse)  [(Blue Whale, 0.49859235), (Ferrari, 0.5013809...
1        (cheetah, mountain lion)  [(Blue Whale, 0.48881102), (Ferrari, 0.502793)...

In [19]: df_scores = pred_df['label_scores']
In [21]: df_scores
Out[21]:
0    [(Blue Whale, 0.49859235), (Ferrari, 0.5013809...
1    [(Blue Whale, 0.48881102), (Ferrari, 0.502793)...
Name: label_scores, dtype: object

In [22]: labels = [t[1] for t in df_scores[0]]

In [23]: labels
Out[23]: [0.49859235, 0.5013809, 0.49264234, 0.5186422, 0.5096679]

In [24]: labels = [t[0] for t in df_scores[0]]

In [25]: labels
Out[25]: ['Blue Whale', 'Ferrari', 'african zebra', 'arabian horse', 'bobcat']

In [26]: scores = [t[1] for t in df_scores[0]]

In [27]: scores
Out[27]: [0.49859235, 0.5013809, 0.49264234, 0.5186422, 0.5096679]

In [28]: scores = [t[1] for t in df_scores[1]]

In [29]: scores
Out[29]: [0.48881102, 0.502793, 0.48751196, 0.49272105, 0.5228181]

Tags: textindfblueoutdictzebrapred
3条回答

虽然不漂亮但很管用:

pred_dict = {
    ('african zebra', 'arabian horse'): [('Blue Whale', 0.49859235),
                                         ('Ferrari', 0.5013809),
                                         ('african zebra', 0.49264234),
                                         ('arabian horse', 0.5186422),
                                         ('bobcat', 0.5096679)],
    ('cheetah', 'mountain lion'): [('Blue Whale', 0.48881102),
                                   ('Ferrari', 0.502793),
                                   ('african zebra', 0.48751196),
                                   ('arabian horse', 0.49272105),
                                   ('bobcat', 0.5228181)]
}

df = pd.DataFrame(pred_dict).T
df.columns = [tuple[0] for tuple in list(df.iloc[0])]
df = df.apply(lambda x:  [tuple[1] for tuple in x])
df.reset_index(inplace=True)
df.insert(0, "Text", list(zip(df.level_0, df.level_1)))
df.drop(["level_0", "level_1"], axis=1, inplace=True)

其输出为:

                             Text  Blue Whale  ...  arabian horse    bobcat
0  (african zebra, arabian horse)    0.498592  ...       0.518642  0.509668
1        (cheetah, mountain lion)    0.488811  ...       0.492721  0.522818

好的。经过一番试验,他终于成功了。我是这样做的:

text = list(pred_dict.keys())
values = list(pred_dict.values())
df_1 = pd.DataFrame({'text': text})

score_dict = {}
for label in mlb_classes:
     score_list = []
     for t_list in values:
         for t in t_list:
             if t[0] == label:
                 score_list.append(t[1])
     score_dict[label] = score_list

df_2 = pd.DataFrame(score_dict)

score_df = pd.concat([df_1, df_2], axis=1)
print(score_df)

输出:

     text  Blue Whale   Ferrari  african zebra  arabian horse    bobcat    
0  (african zebra, arabian horse)    0.519343  0.511951       0.512639       0.527919  0.491461  0.516240  
1        (cheetah, mountain lion)    0.495197  0.527627       0.497516       0.512571  0.488823  0.510277  
pred_dict = {('african zebra', 'arabian horse'): [('Blue Whale', 0.49859235), ('Ferrari', 0.5013809), ('african zebra', 0.49264234), ('arabian horse', 0.5186422), ('bobcat', 0.5096679)], ('cheetah', 'mountain lion'): [('Blue Whale', 0.48881102), ('Ferrari', 0.502793), ('african zebra', 0.48751196), ('arabian horse', 0.49272105), ('bobcat', 0.5228181)]}

这应该做到:

pd.concat([pd.DataFrame(r,columns=['Text','value'],index=[t]*len(r)) for (t, r) in pred_dict.items()]).set_index('Text',append=True).unstack('Text')['value']

产生以下结果: enter image description here

相关问题 更多 >