Python Pandas比较数据帧行中的字符串（不包括空字符串）

Col = list(ENTITY.columns.values) for i in combinations(Col,2): df[i[0]+' to '+i[1]+' dedication'] =df.apply(lambda row: row[i[0]] == row[i[1]],axis=1) df[i[0]+' to '+i[1]+' dedication'] = np.where(df[i[0]+' to '+i[1]+' dedication'], 'Y', 'N')

3条回答

网友

1楼 · 编辑于 2024-05-14 03:26:01

通常是NaN != NaN，因此如果这些值被存储为空值，简单的比较就足够了。如果将它们存储为'None（字符串）

df = pd.DataFrame(data={'col1':['a', None, None, 'a', 'a'], 'col2': ['a', 'a', None, None, 'b']})

  col1        col2
0     a       a
1     None    a
2     None    None
3     a       None
4     a       b

^{pr2}$

      col1    col2    col1_col2
0     a       a       True
1     None    a       False
2     None    None    False
3     a       None    False
4     a       b       False

2个小提示

可以使用iteritems简化循环，不需要使用索引
我试图将计算结果保存在与初始数据和中间结果不同的数据帧中。这样就可以更容易地排除出问题并重新开始。我只在内存有问题时重用原始数据帧

网友

2楼 · 编辑于 2024-05-14 03:26:01

这里的基本逻辑是用替换空字符串努比·楠

>>>；努比== 努比·楠

错误

import numpy as np
ENTITY.replace(to_replace="None",value=np.nan,inplace=True)
# your code below

网友

3楼 · 编辑于 2024-05-14 03:26:01

您需要^{}或{a2}与None进行比较（或与{}进行比较）：

df.apply(lambda row: (row[i[0]] == row[i[1]]) and 
                      pd.notnull(row[i[0]]) and 
                      pd.notnull(row[i[1]), axis=1)

但更好的方法是比较列，然后它可以完美地工作，因为np.nan != np.nan：

^{pr2}$

样品：

df = pd.DataFrame({'Key':[1,2,3,4],
                   'SCANNER A':['AAA1', None, None, 'AAA1'],
                   'SCANNER B':['AAA1', 'AAA2', None, 'AAA2']})

df['new'] = np.where(df['SCANNER A'] == df['SCANNER B'], 'Y', 'N')
print (df)
   Key SCANNER A SCANNER B new
0    1      AAA1      AAA1   Y
1    2      None      AAA2   N
2    3      None      None   N
3    4      AAA1      AAA2   N

相关问题更多 >

编程相关推荐

热门问题

热门文章