python/pandas通过搜索两列得到结果

l1 = [1,2,3,4,5] l2 = ['UNIVERSITY OF CONN. OF','UNIVERSITY OF CONNECTICUT','ONTARIO','UNIV. OF TORONTO','ALASKA DEPT.OF'] l3 = ['US','US','CA','CA',np.NaN] df = pd.DataFrame({'some_id':l1,'org':l2,'country':l3}) df

2条回答

网友

1楼 · 编辑于 2024-04-27 05:18:17

正如@Vaishali所指出的，我们必须使用按位的&而不是and。你知道吗

matches_org = df["org"].str.contains("UNIVERSITY OF CONN", na=False)
matches_country = df["country"] == "US"

matches_org_and_country = df[matches_org & matches_country]

为了过滤，我们总是将一个布尔序列传递到df。当组合两个过滤器时，我们按元素组合两个布尔级数。你知道吗

如果您想经常使用and和or，请查看^{}。你知道吗

按位与

>>> pd.Series([True, True, False]) & pd.Series([True, False, True])
0     True
1    False
2    False
dtype: bool

按位或

>>> pd.Series([True, True, False]) | pd.Series([True, False, True])
0    True
1    True
2    True
dtype: bool

网友

2楼 · 编辑于 2024-04-27 05:18:17

当您尝试and数据帧/序列操作的结果时，它会尝试查看df1 and df2。这意味着它尝试查看结果数据帧是否是True。在pandas中没有数据帧是True的概念，因此它会将错误返回给您。你知道吗

正确的方法是使用按位and运算符：&。在这个场景中，它将比较每个对应的行，而不是整个数据帧/序列。所以，你的代码应该是：

df[['org','country']][(df['org'].str.contains('UNIVERSITY OF CONN', na=False)) & (df['country'] == 'US')]

按位与

按位或

相关问题更多 >

编程相关推荐

热门问题

热门文章