基于python字典给出的结构转换数据帧

2024-06-17 09:03:46 发布

您现在位置:Python中文网/ 问答频道 /正文

我在一个数据框中有下表:

<table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>AquaticSettingName</th> <th>Definition</th> <th>d_level_id</th> <th>Concat_Code</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>Marine</td> <td>The Marine System is defined by salinity, which is typically about 35, although salinity can mea...</td> <td>1</td> <td>M</td> </tr> <tr> <th>1</th> <td>Estuarine</td> <td>The Estuarine System is defined by salinity and geomorphology. This System includes tidally infl...</td> <td>1</td> <td>E</td> </tr> <tr> <th>2</th> <td>Lacustrine</td> <td>The CMECS Lacustrine System includes (a) all deepwater areas of the Great Lakes and (b) shorelin...</td> <td>1</td> <td>L</td> </tr> <tr> <th>3</th> <td>Marine Nearshore</td> <td>The Marine Nearshore Subsystem extends from the landward limit of the Marine System to the 30 me...</td> <td>2</td> <td>M.1</td> </tr> <tr> <th>4</th> <td>Marine Offshore</td> <td>The Marine Offshore Subsystem extends from the 30 meter depth contour to the continental shelf b...</td> <td>2</td> <td>M.2</td> </tr> <tr> <th>5</th> <td>Marine Oceanic</td> <td>The Marine Oceanic Subsystem represents the open ocean, extending from the continental shelf bre...</td> <td>2</td> <td>M.3</td> </tr> <tr> <th>6</th> <td>Estuarine Coastal</td> <td>The Estuarine Coastal Subsystem extends from the supratidal zone at the land margin up to the 4 ...</td> <td>2</td> <td>E.1</td> </tr> <tr> <th>7</th> <td>Estuarine Open Water</td> <td>The Estuarine Open Water Subsystem includes all waters of the Estuarine System with a total dept...</td> <td>2</td> <td>E.2</td> </tr> <tr> <th>8</th> <td>Estuarine Tidal Riverine Coastal</td> <td>The Estuarine Tidal Riverine Coastal Subsystem includes the most upstream region of the estuary,...</td> <td>2</td> <td>E.3</td> </tr> <tr> <th>9</th> <td>Estuarine Tidal Riverine Open Water</td> <td>The Estuarine Tidal Riverine Open Water Subsystem includes tidal freshwater areas with a salinit...</td> <td>2</td> <td>E.4</td> </tr> <tr> <th>10</th> <td>Lacustrine Limnetic</td> <td>The Limnetic Subsystem includes all deepwater habitats within the Lacustrine System. "Deepwater ...</td> <td>2</td> <td>L.1</td> </tr> <tr> <th>11</th> <td>Lacustrine Littoral</td> <td>The Littoral Subsystem includes shallow habitats in the Lacustrine System. The shoreward boundar...</td> <td>2</td> <td>L.2</td> </tr> <tr> <th>12</th> <td>Marine Nearshore Subtidal</td> <td>The substrate is generally continuously submerged in this zone and includes those areas below MLLW.</td> <td>3</td> <td>M.1.a</td> </tr> <tr> <th>13</th> <td>Marine Nearshore Intertidal</td> <td>The substrate is regularly and periodically exposed and flooded by tidal action. This zone exten...</td> <td>3</td> <td>M.1.b</td> </tr> <tr> <th>14</th> <td>Marine Nearshore Supratidal</td> <td>This zone includes areas above MHHW that are affected by wave splash and overwash but does not i...</td> <td>3</td> <td>M.1.c</td> </tr> <tr> <th>15</th> <td>Marine Offshore Subtidal</td> <td>The substrate is subtidal and continuously submerged in this zone and includes those areas below...</td> <td>3</td> <td>M.2.a</td> </tr> <tr> <th>16</th> <td>Marine Oceanic Subtidal</td> <td>The substrate is subtidal and continuously submerged in this zone.</td> <td>3</td> <td>M.3.a</td> </tr> <tr> <th>17</th> <td>Estuarine Coastal Subtidal</td> <td>The substrate is generally continuously submerged in this zone and includes those areas below MLLW.</td> <td>3</td> <td>E.1.a</td> </tr> <tr> <th>18</th> <td>Estuarine Coastal Intertidal</td> <td>The substrate in this zone is regularly and periodically exposed and flooded by tides. This zone...</td> <td>3</td> <td>E.1.b</td> </tr> <tr> <th>19</th> <td>Estuarine Coastal Supratidal</td> <td>This zone includes areas above MHHW; areas in this zone are affected by wave splash and overwash...</td> <td>3</td> <td>E.1.c</td> </tr> <tr> <th>20</th> <td>Estuarine Open Water Subtidal</td> <td>The substrate is generally continuously submerged in this zone and includes those areas below MLLW.</td> <td>3</td> <td>E.2.a</td> </tr> <tr> <th>21</th> <td>Estuarine Tidal Riverine Coastal Subtidal</td> <td>The substrate is generally continuously submerged in this zone and includes those areas below MLLW.</td> <td>3</td> <td>E.3.a</td> </tr> <tr> <th>22</th> <td>Estuarine Tidal Riverine Coastal Intertidal</td> <td>The substrate in this zone is regularly and periodically exposed and flooded by tides. This zone...</td> <td>3</td> <td>E.3.b</td> </tr> <tr> <th>23</th> <td>Estuarine Tidal Riverine Open Water Subtidal</td> <td>The substrate is generally continuously submerged in this zone and includes those areas below MLLW.</td> <td>3</td> <td>E.4.a</td> </tr> </tbody> </table>

我需要使用最后一列将此表重新格式化为:

<table> <tr> <th>System</th> <th>SubSystem</th> <th>Tidal Zone</th> </tr> <tr> <td>Marine</td> <td>Marine Nearshore</td> <td>Marine Nearshore Subtidal</td> </tr> <tr> <td> </td> <td> </td> <td>Marine Nearshore Intertidal</td> </tr> <tr> <td> </td> <td> </td> <td>Marine Nearshore Supratidal</td> </tr> <tr> <td>Estuarine</td> <td>Estuarine Coastal</td> <td>Estuarine Coastal Subtidal</td> </tr> <tr> <td> </td> <td> </td> <td>Estuarine Coastal Intertidal</td> </tr> <tr> <td> </td> <td> </td> <td>Estuarine Coastal Supratidal</td> </tr> <tr> <td> </td> <td>Estuarine Open Water</td> <td>Estuarine Open Water Subtidal</td> </tr> <tr> <td> </td> <td>Estuarine Tidal Riverine Coastal</td> <td>Estuarine Tidal Riverine Coastal Subtidal</td> </tr> <tr> <td> </td> <td> </td> <td>Estuarine Tidal Riverine Coastal Intertidal</td> </tr> <tr> <td> </td> <td>Estuarine Tidal Riverine Open Water</td> <td>Estuarine Tidal Riverine Open Water Subtidal</td> </tr> <tr> <td>Lacustrine</td> <td>Lacustrine Littoral</td> <td></td> </tr> <tr> <td> </td> <td>Lacustrine Limnetic</td> <td> </td> </tr> </table>

所需的输出可以使用Concat\u Code列中包含的信息生成

我的方法是使用SplitGroupby,如this notebook中所述(注意:在那个笔记本中,我使用了一个黑客的解决方案来解决我在groupby中遇到的问题,我在一个单独的问题here中发布了它)

通过这种方法,我得到了一个新的字典,它看起来像:

{'E': {'1': ['a', 'b', 'c'], 
       '2': ['a'], 
       '3': ['a', 'b'], 
       '4': ['a']},
 'L': {'1': ['x'], 
       '2': ['x']},
 'M': {'1': ['a', 'b', 'c'], 
       '2': ['a'], 
       '3': ['a']}}

我使用上面的字典来构造我的数据帧,其中第一个键指向第一个组,第二个键指向第二个组,依此类推。。。在嵌套for循环Irecreate中,使用Concat_Column从第一个表中提取记录。row counter每个第一级组获得一个增量:

df1 = pd.DataFrame(columns=['System', 'Sub System', 'Tidal Zone'])

row=0

for i in k.keys():
    df1.set_value(row, 'System', df[df['Concat_Code']==i].AquaticSettingName.values[0])
    for j in k[i]:
        df1.set_value(row, 'Sub System', df[df['Concat_Code']==i+'.'+str(j)].AquaticSettingName.values[0])
        for v in k[i][j]:
            try:
                df1.set_value(row, 'Tidal Zone', df[df['Concat_Code']==i+'.'+str(j)+'.'+str(v)].AquaticSettingName.values[0])
            except:
                df1.set_value(row, 'Tidal Zone', '')
            row=row+1

import numpy as np
df1 = df1.replace(np.nan,' ', regex=True)

返回指定的输出

有没有更好、最简单、更可靠的方法来实现同样的结果


Tags: andtheinzoneissystemtrtd