如何消除缺失d

2024-04-24 12:01:32 发布

您现在位置:Python中文网/ 问答频道 /正文

enter image description here我想消除名为crsp\u data的文件中列ret和dlret中缺少的值。这是我的密码:

crsp_data_ret=crsp_data['ret'].dropna()
crsp_data_dlret=crsp_data['dlret'].dropna()
crsp_data['retadj']=(1+crsp_data['ret'])*(1+crsp_data['dlret'])-1

但它给了我以下错误:

KeyError                                  Traceback (most recent call last)
/anaconda3/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3062             try:
-> 3063                 return self._engine.get_loc(key)
   3064             except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'dlret'

有人能帮我指出我做错的地方吗? 谢谢你的帮助!你知道吗

There are NANs in ret


crsp_data['retadj']=(1+crsp_data['ret'])*(1+crsp_data['dlret_x'])-1
crsp_data_retadj=crsp_data.dropna(subset=['retadj'])
crsp_data['retadj'].head(50)

0南 1 -0.248538 2 0.428202 3 -0.086215 4 -0.125488 5 0.030425 6 -0.203367 7 -0.611781 8 -0.051796 9 -0.328013 10 0.065550 11 -0.413984 12 -0.343434 13 0.052632 14 -0.420102 15 -0.089628 16 -0.036559 17南 18南 19 0.039082 20 0.480844 21 0.025029 22 0.056209 23 -0.013069 24 -0.060239 25南 26 0.033846 27南 28 0.121294 29 0.185520 30 -0.035714 31南


Tags: inselfpandasdatagetindexloclibs
1条回答
网友
1楼 · 发布于 2024-04-24 12:01:32

问题是:

crsp_data_ret=crsp_data['ret']
crsp_data_dlret=crsp_data['dlret']

return Series,因此稍后使用不可能选择^{}

crsp_data_ret=crsp_data['ret'].dropna()
crsp_data_dlret=crsp_data['dlret'].dropna()

解决方案是去除['ret']['dlret']

crsp_data['retadj']=(1+crsp_data)*(1+crsp_data)-1

另一种解决方案是使用^{},因此DataFrame返回:

crsp_data_ret=crsp_data.dropna(subset=['ret'])
crsp_data_dlret=crsp_data.dropna(subset=['dlret'])


crsp_data['retadj']=(1+crsp_data['ret'])*(1+crsp_data['dlret'])-1

编辑:

如果需要忽略NaN,一种可能的解决方案是将^{}与参数fill_value=0一起使用,然后得到:

crsp_data = pd.DataFrame({'ret':[1,2,'C', 5, np.nan],
                   'dlret':[10, np.nan, 7, 1, np.nan]})

crsp_data['ret'] = pd.to_numeric(crsp_data['ret'], errors='coerce')

crsp_data['retadj1']=(1+crsp_data['ret'])*(1+crsp_data['dlret'])-1

crsp_data['retadj2']= crsp_data['ret'].add(1, fill_value=0).mul(crsp_data['dlret'].add(1, fill_value=0)).sub(1)
print (crsp_data)

   ret  dlret  retadj1  retadj2
0  1.0   10.0     21.0     21.0
1  2.0    NaN      NaN      2.0
2  NaN    7.0      NaN      7.0
3  5.0    1.0     11.0     11.0
4  NaN    NaN      NaN      0.0

细节

print (crsp_data['ret'].add(1, fill_value=0))
0    2.0
1    3.0
2    1.0
3    6.0
4    1.0
Name: ret, dtype: float64

print (crsp_data['dlret'].add(1, fill_value=0).sub(1))
0    10.0
1     0.0
2     7.0
3     1.0
4     0.0
Name: dlret, dtype: float64

相关问题 更多 >