数据清理：从具有标头和索引的数据集中删除0值_

2条回答

网友

1楼 · 编辑于 2024-05-14 12:49:12

您可以继续使用^{}：

首先我们计算每行的列数和零数：

n_columns = len(df.columns)  # or df.shape[1]
zeroes = (df == "0").sum(axis=1)

然后，我们只选择零小于20%的行。你知道吗

proportion_zeroes = zeroes / n_columns
max_20 = proportion_zeroes < 0.20
df[max_20]  # This will contain only rows that have less than 20 % zeroes

一行：

df[((df == "0").sum(axis=1) / len(df.columns)) < 0.2]

网友

2楼 · 编辑于 2024-05-14 12:49:12

如果您可以发布数据框在pandas中的外观，而不是excel文件的图片，那就太好了。但是，构造一个虚拟df

 df = pd.DataFrame({'index1':['a','b','c'],'index2':['b','g','f'],'index3':['w','q','z']
              ,'Col1':[0,1,0],'Col2':[1,1,0],'Col3':[1,1,1],'Col4':[2,2,0]})

第1步，可以使用.set\u index（）方法指定索引，如下所示

df.set_index(['index1','index2','index3'],inplace=True)

在数据帧过滤过程中，您可以使用从df_bool.sum(axis=1)得到的返回值，而不是手工进行过滤，如下所示

df.loc[(df==0).sum(axis=1) / (df.shape[1])>0.6]
index1  index2  index3  Col1    Col2    Col3    Col4
c       f       z       0       0       1       0

使用它你可以删除那些行，假设20%，那么你会使用

df = df.loc[(df==0).sum(axis=1) / (df.shape[1])<0.2]

在头文件问题上，如果没有看到文件或数据帧的样子，就有点难以回答

相关问题更多 >

编程相关推荐

热门问题

热门文章

数据清理：从具有标头和索引的数据集中删除0值_

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >