基于4个条件的数据帧列,嵌套np.wh公司

2024-05-29 03:08:17 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在使用的数据帧有4个可能的组合,超过2列和几百个组。在

| Group |   Before   |    After   |
|:-----:|:----------:|:----------:|
|   G1  |  Injection |  Injection |
|   G1  |  Injection | Production |
|   G1  | Production |  Injection |
|   G1  | Production | Production |

有3个预计算的柱需要根据下面所示的前/后组合进行拉伸。在

^{pr2}$

我试过多个嵌套np.哪里的

np.where(df['Before'] == 'Injection' & df['After'] == 'Injection', df['DTI'],
np.where(....))

结果是:

ValueError: either both or neither of x and y should be given

和嵌套多个np.逻辑公司名称:

np.where(np.logical_and(df['Before'] == 'Injection' & df['After'] == 'Injection'), df['DTP'])

结果是:

the truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

我已经到了我能做的事情的上限,需要一些想法!在


Tags: orandof数据dfnpgroupwhere
2条回答

Before["Injection"]没有按照您的想法操作。在你展示的代码中,它甚至没有被定义。在

你可能想要的是:

# df definition, skipping Group because it is not needed here
df = pd.DataFrame(data={"Before": ["Injection", "Injection", "Production", "Production"], "After": ["Injection", "Production", "Injection", "Production"]})

df["Output"] = "DTI"  # Use one of the cases as default
df.loc[(df["Before"] == "Injection") & (df["After"] == "Production"), "Output"] = "DTWF + DTP"
df[(df["Before"] == "Production") & (df["After"] == "Injection"), "Output"] = "DTWF + DTI"
df[(df["Before"] == "Production") & (df["After"] == "Production"), "Output"] = "DTP"
print(df)
#         After      Before      Output
# 0   Injection   Injection         DTI
# 1  Production   Injection  DTWF + DTP
# 2   Injection  Production  DTWF + DTI
# 3  Production  Production         DTP

如果您有许多这样的组合,那么使用另一个答案中建议的apply可能更合适。在

如果您有很多行,那么将布尔索引(例如df["Before"] == "Production")保存到变量中,然后直接保存

^{pr2}$

如果您也只有这两种状态,您可以通过使用一元否定运算符~免费获得第二种状态:

df.loc[before_prod & ~after_prod, "Output"] = "DTWF + DTI"

一种方法是使用apply函数:

假设您的数据帧在变量df中,您可以执行以下操作:

import pandas as pd

df = pd.DataFrame(data={"Before": ["Injection", "Injection", "Production", "Production"],
                        "After": ["Injection", "Production", "Injection", "Production"]})
def get_output(x):
    if x['Before'] == 'Injection' and x['After'] == 'Injection':
        return 'DTI'
    elif x['Before'] == 'Injection' and x['After'] == 'Production':
        return 'DTWF + DTP'
    elif x['Before'] == 'Production' and x['After'] == 'Injection':
        return 'DTWF + DTI'
    elif x['Before'] == 'Production' and x['After'] == 'Production':
        return 'DTP'

df['Output'] = df.apply(get_output, axis=1)

相关问题 更多 >

    热门问题