我有以下数据帧:
df =
VD_1 VD_2 VD_3 VD_4 VD_5 TYPE VAL
NaN XX VV DD NaN ABC 5
NaN XX MM VV NaN ABC 6
XX MM NaN NaN NaN ABC 6
TT XX MM NaN NaN ABC 5
我只想保留那些第一个非NaN值等于XX的行,以及至少两个不等于NaN和XX的后续值。你知道吗
问题是return x
返回None, None, None
。。。它只在我使用return row
时有效,但是结果不包含与df
相同的列数。代码既不从分析中排除列TYPE
和VAL
。你知道吗
def customFilter(x):
row = x.dropna()
if (row[0] == 'XX') & (('XX' not in row[1:]) & (len(row[1:]) >= 2)):
return row
return np.nan
df = df.apply(customFilter, axis=1).dropna(how='all', axis=0)
Is there any trick to solve the mentioned issues?
更新:
# Delete rows that do not start from AG
def calculate_correct_rows(df):
# Create drop rows
drop_rows = []
i = 0
for index, x in df.iterrows():
row = x.dropna()
if (row[0] == 'XX') & (('XX' not in row[1:]) & (len(row[1:]) >= 2)):
drop_rows.append(i)
i = i + 1
return drop_rows
# Drop the rows in list
subset2 = df.filter(like='VD_')
correct_rows = calculate_correct_rows(subset2)
final2 = df.loc[correct_rows,:]
也许有一种更漂亮的方法可以做到这一点,但是您可以简单地分两步而不是一步来执行过滤器。首先,创建一个不符合上述标准的所有行的列表。第二,使用
df.drop(rows)
删除步骤1中创建的列表中的行。你知道吗这是到
drop
:drop的链接例如
相关问题 更多 >
编程相关推荐