用datafram的每个“块”创建一个字典

2024-06-16 13:47:29 发布

您现在位置:Python中文网/ 问答频道 /正文

假设我有一个这样的数据帧。你知道吗

         REFERENCE_CODE                                        TRANSLATION
0            ladder_now                                                NaN
1                     0                                              xyzwu
2                     1                                              yxzuv
3                     2                                            asdfasd
4                     3                                             sdfsdh
5                     4                                             hghffg
6                     5                                            agfdhsj
7                     6                                            dfgasgf
8                     7                                             jfhkgj
9                     8                                           djfgjfhk
10                    9                                            dsfasys
11                   10                                            kghkfdy
12                   98                                          dsfhsuert
13                   99                                           wsdfadjs
14        country_satis  Sa pangkagab’san, aoogma po ba kamo o dai naoo...
15                    1                                            Naoogma
16                    2                                        Dai naoogma
17                    8                           Dai aram (HUWAG BASAHIN)
18                    9                           Huminabo (HUWAG BASAHIN)
19                                                                     NaN
20             econ_sit  Ngonyan naman po ay manongod sa sitwasyon kan ...
21                    1                                             Marhay
22                    2                                       Medyo marhay
23                    3                                       Medyo maraot
24                    4                                   Talagang maraot 
25                    8                         Hindi alam (HUWAG BASAHIN)
26                    9                           Tumanggi (HUWAG BASAHIN)
27                                                                     NaN
28  children_betteroff2  Sa pagdakula po kan mga aki ngonyan sa Pilipin...
29                    1                                         Mas marhay
30                    2                                         Mas maraot
31                    3                        Pareho lang (HUWAG BASAHIN)
32                    8                         Hindi alam (HUWAG BASAHIN)
33                    9                           Tumanggi (HUWAG BASAHIN)
34                                                                     NaN
35             fav_batt  Pakisabi po sakuya kon kamo ay may talagang ma...
36               fav_US                                  An Estados Unidos
37            fav_China                                              Tsina
38           fav_Russia                                             Russia
39               fav_eu                                 Ang European Union
40               fav_un                                ang United Nations 
41          fav_Germany                                       GEEEEERhmany
42             fav_NATO                                            NAAAATO
43                                                                     NaN
44                    1                                    Talagang marhay
45                    2                                       Medyo marhay
46                    3                                Medyo bakong marhay
47                    4                         Talagang\n bakong marhay\n
48                    8                         Hindi alam (HUWAG BASAHIN)
49                    9                           Tumanggi (HUWAG BASAHIN)

我的目标是从每个“批”中创建一个字典。意思是,我想把每一个小系列,并创建一个dict,看起来像:

{'ladder_now': nan, '0': 'xyzwu', '1': 'yxzuv', '2': 'asdfasd', '3': 'sdfsdh', '4': 'hghffg', '5': 'agfdhsj', '6': 'dfgasgf', '7': 'jfhkgj', '8': 'djfgjfhk', '9': 'dsfasys', '10': 'kghkfdy', '98': 'dsfhsuert', '99': 'wsdfadjs'}

{'country_satis': 'Sa pangkagab’san, aoogma po ba kamo o dai naoogma sa mga bagay na nangyayari sa nasyon o bansa ta sa sangonyan?', '1': 'Naoogma', '2': 'Dai naoogma', '8': 'Dai aram (HUWAG BASAHIN)', '9': 'Huminabo (HUWAG BASAHIN)', '': nan}

等等。你知道吗

我当前正在通过运行以下命令创建此dict:

ref_dict = dict(zip(df['REFERENCE_CODE'], df['TRANSLATION']))

我的问题是,由于值(即,0,1,2,3,…)不是唯一的,所以在整个数据帧上运行这个dict命令会覆盖它。我是否可以为每个批动态创建dict?你知道吗

提前谢谢!你知道吗


Tags: sanandictpofavdaimaraothuwag
1条回答
网友
1楼 · 发布于 2024-06-16 13:47:29

你可以用发电机来获取这些录音。一旦检测到第一个重复密钥,就会创建一个新的dict:

import pandas as pd

helper = {'REFERENCE_CODE': ['ladder_now', 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 98, 99, 'country_satis', 1, 2, 8, 9, '', 'econ_sit', 1], 
          'TRANSLATION': ['NaN', 'abc', 'def', 'ghi', 'jkl', 'mno', 'pqr', 'stu', 'vwx', 'yz', 'NaN', 'abc', 'def', 'ghi', 'jkl', 'mno', 'pqr', 'stu', 'vwx', 'yz', 'NaN', 'NaN']}

df = pd.DataFrame(helper)

def dict_generator(df):
    seen = {}
    for _, rows in df.iterrows():
        if rows[0] in seen:
            yield seen
            seen.clear()

        seen[rows[0]] = rows[1]
    yield seen

for adict in dict_generator(df):
    print(adict)

这将生成以下输出:

{0: 'abc', 1: 'def', 2: 'ghi', 3: 'jkl', 4: 'mno', 5: 'pqr', 6: 'stu', 7: 'vwx', 8: 'yz', 9: 'NaN', 10: 'abc', 'country_satis': 'jkl', 98: 'def', 99: 'ghi', 'ladder_now': 'NaN'}
{'': 'yz', 1: 'mno', 2: 'pqr', 'econ_sit': 'NaN', 8: 'stu', 9: 'vwx'}
{1: 'NaN'}

顺序可能不同,因为dict在Python中是无序的数据结构。你知道吗

相关问题 更多 >