如何在Python中还原pandas数据框？

3 投票

1 回答

2295 浏览

提问于 2025-04-17 16:54

我把一个 pandas 数据框处理成适合用 ggplot 绘图的格式（因为 ggplot 通常需要数据框是长格式），具体操作如下：

test = pandas.melt(iris, id_vars=["Name"], value_vars=["SepalLength", "SepalWidth"])

这样做的结果是保留了鸢尾花数据集中 Name 这一列作为索引，但把 SepalLength 和 SepalWidth 这两列转换成了长格式：

test.ix[0:10]
Out:
           Name     variable  value
0   Iris-setosa  SepalLength    5.1
1   Iris-setosa  SepalLength    4.9
2   Iris-setosa  SepalLength    4.7
3   Iris-setosa  SepalLength    4.6
4   Iris-setosa  SepalLength    5.0
5   Iris-setosa  SepalLength    5.4
6   Iris-setosa  SepalLength    4.6
7   Iris-setosa  SepalLength    5.0
8   Iris-setosa  SepalLength    4.4
9   Iris-setosa  SepalLength    4.9
10  Iris-setosa  SepalLength    5.4

那么，我该如何把这个数据框“还原”回来呢？我希望保留 Name 列，但把 variable 字段的值转换成独立的列。因为 Name 字段不是唯一的，所以我觉得它不能用作索引。我觉得 pivot 函数应该可以做到这一点，但似乎不太对：

test.pivot(columns="variable", values="value")
KeyError: u'no item named '

我该怎么做呢？另外，如果有多个列都是长格式的，比如 test 中有多个像上面 variable 列那样的列，我也能还原吗？这意味着 columns 需要接受一个列的列表，而不是单个值，似乎是这样。谢谢。

数据处理索引数据转换 pandas 列操作 ggplot 长格式数据框还原

1 个回答

我觉得这个情况有点模糊，因为这个 test 数据框没有一个能识别每一行的独特索引。如果 melt 只是把 value_vars 中的 SepalLength 和 SepalWidth 的行堆叠起来，那么你可以手动创建一个索引来进行透视；看起来结果和原来的数据是一样的：

In [15]: test['index'] = range(len(test) / 2) * 2
In [16]: test[:10]
Out[16]: 
          Name     variable  value  index
0  Iris-setosa  SepalLength    5.1      0
1  Iris-setosa  SepalLength    4.9      1
2  Iris-setosa  SepalLength    4.7      2
3  Iris-setosa  SepalLength    4.6      3
4  Iris-setosa  SepalLength    5.0      4
5  Iris-setosa  SepalLength    5.4      5
6  Iris-setosa  SepalLength    4.6      6
7  Iris-setosa  SepalLength    5.0      7
8  Iris-setosa  SepalLength    4.4      8
9  Iris-setosa  SepalLength    4.9      9

In [17]: test[-10:]
Out[17]: 
               Name    variable  value  index
290  Iris-virginica  SepalWidth    3.1    140
291  Iris-virginica  SepalWidth    3.1    141
292  Iris-virginica  SepalWidth    2.7    142
293  Iris-virginica  SepalWidth    3.2    143
294  Iris-virginica  SepalWidth    3.3    144
295  Iris-virginica  SepalWidth    3.0    145
296  Iris-virginica  SepalWidth    2.5    146
297  Iris-virginica  SepalWidth    3.0    147
298  Iris-virginica  SepalWidth    3.4    148
299  Iris-virginica  SepalWidth    3.0    149

In [18]: df = test.pivot(index='index', columns='variable', values='value')
In [19]: df['Name'] = test['Name']
In [20]: df[:10]
Out[20]: 
variable  SepalLength  SepalWidth         Name
index                                         
0                 5.1         3.5  Iris-setosa
1                 4.9         3.0  Iris-setosa
2                 4.7         3.2  Iris-setosa
3                 4.6         3.1  Iris-setosa
4                 5.0         3.6  Iris-setosa
5                 5.4         3.9  Iris-setosa
6                 4.6         3.4  Iris-setosa
7                 5.0         3.4  Iris-setosa
8                 4.4         2.9  Iris-setosa
9                 4.9         3.1  Iris-setosa

In [21]: (iris[["SepalLength", "SepalWidth", "Name"]] == df[["SepalLength", "SepalWidth", "Name"]]).all()
Out[21]: 
SepalLength    True
SepalWidth     True
Name           True

回答于 2025-04-17 由 Python大师

分享举报

如何在Python中还原pandas数据框？

1 个回答

撰写回答