如何删除某列中值为NaN的Pandas数据帧的行

>>> df STK_ID EPS cash STK_ID RPT_Date 601166 20111231 601166 NaN NaN 600036 20111231 600036 NaN 12 600016 20111231 600016 4.3 NaN 601009 20111231 601009 NaN NaN 601939 20111231 601939 2.5 NaN 000001 20111231 000001 NaN NaN

3条回答

网友

1楼 · 编辑于 2024-04-27 15:40:46

我知道这个问题已经得到了答案，但只是为了一个纯粹的熊猫解决这个具体问题，而不是从阿曼的一般描述（这是美妙的），如果有其他人发生这种情况：

import pandas as pd
df = df[pd.notnull(df['EPS'])]

网友

2楼 · 编辑于 2024-04-27 15:40:46

这个问题已经解决了，但是。。。

…同时考虑Wouter在his original comment中提出的解决方案。处理丢失数据（包括dropna()）的能力显式内置于pandas中。除了可能比手动操作提高性能之外，这些功能还提供了多种可能有用的选项。

In [24]: df = pd.DataFrame(np.random.randn(10,3))

In [25]: df.iloc[::2,0] = np.nan; df.iloc[::4,1] = np.nan; df.iloc[::3,2] = np.nan;

In [26]: df
Out[26]:
          0         1         2
0       NaN       NaN       NaN
1  2.677677 -1.466923 -0.750366
2       NaN  0.798002 -0.906038
3  0.672201  0.964789       NaN
4       NaN       NaN  0.050742
5 -1.250970  0.030561 -2.678622
6       NaN  1.036043       NaN
7  0.049896 -0.308003  0.823295
8       NaN       NaN  0.637482
9 -0.310130  0.078891       NaN

In [27]: df.dropna()     #drop all rows that have any NaN values
Out[27]:
          0         1         2
1  2.677677 -1.466923 -0.750366
5 -1.250970  0.030561 -2.678622
7  0.049896 -0.308003  0.823295

In [28]: df.dropna(how='all')     #drop only if ALL columns are NaN
Out[28]:
          0         1         2
1  2.677677 -1.466923 -0.750366
2       NaN  0.798002 -0.906038
3  0.672201  0.964789       NaN
4       NaN       NaN  0.050742
5 -1.250970  0.030561 -2.678622
6       NaN  1.036043       NaN
7  0.049896 -0.308003  0.823295
8       NaN       NaN  0.637482
9 -0.310130  0.078891       NaN

In [29]: df.dropna(thresh=2)   #Drop row if it does not have at least two values that are **not** NaN
Out[29]:
          0         1         2
1  2.677677 -1.466923 -0.750366
2       NaN  0.798002 -0.906038
3  0.672201  0.964789       NaN
5 -1.250970  0.030561 -2.678622
7  0.049896 -0.308003  0.823295
9 -0.310130  0.078891       NaN

In [30]: df.dropna(subset=[1])   #Drop only if NaN in specific column (as asked in the question)
Out[30]:
          0         1         2
1  2.677677 -1.466923 -0.750366
2       NaN  0.798002 -0.906038
3  0.672201  0.964789       NaN
5 -1.250970  0.030561 -2.678622
6       NaN  1.036043       NaN
7  0.049896 -0.308003  0.823295
9 -0.310130  0.078891       NaN

还有其他选项（参见http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html上的文档），包括删除列而不是行。

很方便！

网友

3楼 · 编辑于 2024-04-27 15:40:46

不要drop。只需在EPS是有限的的地方取行：

import numpy as np

df = df[np.isfinite(df['EPS'])]

相关问题更多 >

编程相关推荐

热门问题

热门文章