Pandas:选择d时传递多个列

2024-05-16 13:07:27 发布

您现在位置:Python中文网/ 问答频道 /正文

我可以做pd_data = pd_data[pd_data['db_rating']>0],用db_rating > 0选择记录来过滤数据

现在我还想涉及其他列,例如,同时选择db_rating>0imdb_ratings_count>1000

但是 pd_data = pd_data[pd_data['db_rating']>0 and pd_data['imdb_ratings_count']>1000]给了我一个错误

ValueError                                Traceback (most recent call last)
<ipython-input-120-f83883d4bac8> in <module>()
      3 pd_data['imdb_rating'] = pd_data['imdb_rating'].astype(float)
      4 pd_data['imdb_ratings_count'] = pd_data['imdb_ratings_count'].astype(float)
----> 5 pd_data = pd_data[pd_data['db_rating']>0 and pd_data['imdb_ratings_count']>1000]
      6 pd_data.describe()

D:\Anaconda2\lib\site-packages\pandas\core\generic.pyc in __nonzero__(self)
    696         raise ValueError("The truth value of a {0} is ambiguous. "
    697                          "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
--> 698                          .format(self.__class__.__name__))
    699 
    700     __bool__ = __nonzero__

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

我该怎么做


Tags: andinselfdbdatacountfloatpd
2条回答

在熊猫中使用布尔向量时,请使用位运算符:

pd_data = pd_data[(pd_data['db_rating']>0) & (pd_data['imdb_ratings_count']>1000)]

Pandas正在为此重写布尔&运算符。这应该起作用:

pd_data = pd_data[(pd_data['db_rating']>0) & (pd_data['imdb_ratings_count']>1000)]

http://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexing

相关问题 更多 >