Pandas DataFrame：在多列条件下对数据帧进行编程行拆分

上下文

我正在处理一个数据帧df，其中有许多列填充了数值

df lorem ipsum | dolor sic | ... | (hundreds of cols) --------------------------------------------------------- 0.5 | -6.2 | ... | 79.8 -26.1 | 6200.0 | ... | -65.2 150.0 | 3.14 | ... | 1.008

换句话说，我有list_cols列：

list_cols = ['lorem ipsum', 'dolor sic', ... ] # arbitrary length, of course len(list_cols ) <= len(df.columns), and contains valid columns of my df

我想获得2个数据帧：

包含所有行的1，其中value < 0表示至少一个list_cols（对应于OR）。我们称之为negative_values_matches

1对应于dataframe的剩余部分，我们称之为positive_values_matches

预期结果示例

对于list_cols = ['lorem ipsum', 'dolor sic']，我将获得dataframes were least 1 value in list\u cols is strickly negative:

negative_values_matches lorem ipsum | dolor sic | ... | (hundreds of cols) --------------------------------------------------------- 0.5 | -6.2 | ... | 79.8 -26.1 | 6200.0 | ... | -65.2 positive_values_matches lorem ipsum | dolor sic | ... | (hundreds of cols) --------------------------------------------------------- 150.0 | 3.14 | ... | 1.008

我不想写这种代码：

（其中criterionk是对列k的布尔求值，例如：(df[col_k]>=0)，此处使用括号，因为它是Pandas语法）

这个想法是要有一个程序化的方法。我主要寻找布尔数组，这样就可以使用布尔索引（参见Pandas documentation）。你知道吗

据我所知，这些帖子并不完全是我所说的：

我不知道如何用OR操作符将我的数据帧上的布尔值连接起来，并获得正确的行分割。你知道吗

我能做什么？你知道吗

1条回答

网友

1楼 · 发布于 2024-04-24 10:19:55

经过几次尝试，我终于达到了目标。你知道吗

代码如下：

import Pandas
import numpy
# assume dataframe exists
df = ...
# initiliaze an array of False, matching df number of rows
resulting_bools = numpy.zeros((1, len(df.index)), dtype=bool)

for col in list_cols:
    # obtain array of booleans for given column and boolean condition for [row, column] value
    criterion = df[col].map(lambda x: x < 0) # same condition for each column, different conditions would have been more difficult (for me)

     # perform cumulative boolean evaluation accross columns
    resulting_bools |= criterion

# use the array of booleans to build the required df
negative_values_matches = df[ resulting_bools].copy() # use .copy() to avoid further possible warnings from Pandas depending on what you do with your data frame
positive_values_matches = df[~resulting_bools].copy()

这样，我成功地获得了2个数据帧：

对于list_cols中至少有1列的值为<；0的所有行为1
1与所有其他行（对于list_col中的每一列，值>；=0）

（数组初始化为False取决于布尔值选项）

注意：这种方法可以与multiple conditions on dataframes结合使用。待确认。你知道吗

上下文

预期结果示例

相关问题更多 >

编程相关推荐

热门问题

热门文章