如何在Python Pandas中添加一列具有多个字符串包含条件,而不使用np.where?

2024-03-28 00:04:58 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图通过使用str.contains()np.where()函数提供多个包含条件的字符串来添加一个新列。这样,我就可以得到我想要的最终结果。你知道吗

但是,代码非常长。有没有什么好方法可以使用pandas函数重新实现这个功能?你知道吗

df5['new_column'] = np.where(df5['sr_description'].str.contains('gross to net', case=False).fillna(False),1,
    np.where(df5['sr_description'].str.contains('gross up', case=False).fillna(False),1,
    np.where(df5['sr_description'].str.contains('net to gross',case=False).fillna(False),1,
    np.where(df5['sr_description'].str.contains('gross-to-net',case=False).fillna(False),1,
    np.where(df5['sr_description'].str.contains('gross-up',case=False).fillna(False),1,
    np.where(df5['sr_description'].str.contains('net-to-gross',case=False).fillna(False),1,
    np.where(df5['sr_description'].str.contains('gross 2 net',case=False).fillna(False),1,
    np.where(df5['sr_description'].str.contains('net 2 gross',case=False).fillna(False),1,
    np.where(df5['sr_description'].str.contains('gross net',case=False).fillna(False),1,
    np.where(df5['sr_description'].str.contains('net gross',case=False).fillna(False),1,
    np.where(df5['sr_description'].str.contains('memo code',case=False).fillna(False),1,0)))))))))))

这个输出将是

如果这些字符串包含在'sr\u description'中,则给出1,否则0new_column

可以将多个字符串条件存储在一个列表中,然后读取并应用到函数中。你知道吗

编辑:

样本数据:

sr_description                  new_column
something with gross up.           1
without those words.               0
or with Net to gross               1
if not then we give a '0'          0

Tags: to函数字符串falsenetnpdescriptionwhere
1条回答
网友
1楼 · 发布于 2024-03-28 00:04:58

这是我想到的。你知道吗

代码:

import re
import pandas as pd
import numpy as np

# list of the strings we want to check for
check_strs = ['gross to net', 'gross up', 'net to gross', 'gross-to-net', 'gross-up', 'net-to-gross', 'gross 2 net',
             'net 2 gross', 'gross net', 'net gross', 'memo code']

# From the re.escape() docs: Escape special characters in pattern. 
# This is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it.
check_strs_esc = [re.escape(curr_val) for curr_val in check_strs]

# join all the escaped strings as a single regex
check_strs_re = '|'.join(check_strs_esc)

test_col_1 = ['something with gross up.', 'without those words.', np.NaN, 'or with Net to gross', 'if not then we give a "0"']
df_1 = pd.DataFrame(data=test_col_1, columns=['sr_description'])

df_1['contains_str'] = df_1['sr_description'].str.contains(check_strs_re, case=False, na=False)

print(df_1)

结果:

              sr_description  contains_str
0   something with gross up.          True
1       without those words.         False
2                        NaN         False
3       or with Net to gross          True
4  if not then we give a "0"         False

注意,numpy不是解决方案运行所必需的,我只是用它来测试NaN值。你知道吗

如果有什么不清楚或您有任何问题,请告诉我!:)

相关问题 更多 >