如何在pandas DataFrame中过滤列时广播标量
我想把一个表达式的结果广播到一个数据框中,但不是整个列,只是一个过滤后的子集。下面是一个简化的例子:
In [6]: df1 = DataFrame({"A":[1, 2, 3, 4], "B":["w", "x", "y", "z"], "C":(numpy.
zeros((4), dtype='S1'))})
In [7]: df1
Out[7]:
A B C
0 1 w
1 2 x
2 3 y
3 4 z
这里的A和B是我已有的数据,而C列是我准备放入结果的地方。所以我可以像下面这样广播到整个列:
In [9]: df1['C'] = 'H'
In [10]: df1
Out[10]:
A B C
0 1 w H
1 2 x H
2 3 y H
3 4 z H
但是如果我尝试把(在这个例子中是字母“R”)广播到一个过滤后的子集:
In [14]: (df1[df1['A'] > 2])['C']
Out[14]:
2 H
3 H
Name: C
(只是为了证明过滤是有效的)
现在我尝试把“R”赋值给这个子集……
In [12]: (df1[df1['A'] > 2])['C'] = "R"
In [13]: df1
Out[13]:
A B C
0 1 w H
1 2 x H
2 3 y H
3 4 z H
但是我的值还是没有改变 :( (有趣的是,我并没有收到错误提示!?)请问有没有人能建议我怎么做到这一点?
非常感谢,
2 个回答
1
顺便提一下:pandas在这种情况下做了很好的改进,现在会给出警告:
In [8]: In [12]: (df1[df1['A'] > 2])['C'] = "R"
/Users/tismer/anaconda/bin/ipython:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
#!/bin/bash /Users/tismer/anaconda/bin/python.app
6
首先选择你想要的那一列,然后进行筛选:
df1.loc[df1['A'] > 2, 'C'] = "R"
A B C
0 1 w H
1 2 x H
2 3 y R
3 4 z R