python删除大多数列为NAN的行

dfPA: Index H1 H2 H3 1 *highlighted cols are PII 2 sam red 5 3 pam blue 3 4 rod green 11 5 * this is the end of the data

#get count of cols in df input: cntcols = dfPA.shape[1] output: 3 #get count of cols with nan in df input: a = dfPA.shape[1] - dfPA.count(axis=1) output: 0 2 1 3 2 3 4 3 5 2 (where a is a series) #convert a from series to df dfa = a.to_frame() #delete rows where no. of nan's are greater than 'n' n = 1 for r, row in dfa.iterrows(): if (cntcols - dfa.iloc[r][0]) > n: i = row.name dfPA = dfPA.drop(index=i)

1条回答

网友

1楼 · 发布于 2024-06-09 06:18:57

您应该使用pandas.DataFrame.dropna方法。它有一个thresh参数，可用于定义删除行/列的最小NaN数

设想以下数据帧：

>>> import numpy as np
>>> df = pd.DataFrame([[1,np.nan,1,np.nan], [1,1,1,1], [1,np.nan,1,1], [np.nan,1,1,1]], columns=list('ABCD'))

     A    B  C    D
0  1.0  NaN  1  NaN
1  1.0  1.0  1  1.0
2  1.0  NaN  1  1.0
3  NaN  1.0  1  1.0

可以使用以下命令删除带有NaN的列：

>>> df.dropna(axis=1)

   C
0  1
1  1
2  1
3  1

thresh参数定义保留列的非NaN值的最小数目：

>>> df.dropna(thresh=3, axis=1)

     A  C    D
0  1.0  1  NaN
1  1.0  1  1.0
2  1.0  1  1.0
3  NaN  1  1.0

如果您想根据NaN的数量进行推理：

# example for a minimum of 2 NaN to drop the column
>>> df.dropna(thresh=len(df.columns)-(2-1), axis=1)

如果需要筛选行而不是列，请删除axis参数或使用axis=0：

>>> df.dropna(thresh=3)

相关问题更多 >

编程相关推荐

热门问题

热门文章