基于regex resu创建值为0和1的新列

2024-05-14 01:23:11 发布

您现在位置:Python中文网/ 问答频道 /正文

我的数据帧具有以下值:

data_df

0         student
1         sample text
2         student
3         no students
4         sample texting
5         random sample

我使用regex提取单词为'student'的行,结果如下:

regexdf
0         student
2         student

我的目标是在主数据框中创建一个具有0和1值的新列。i、 e.第0行应该是1,第5行应该是零。(因为'regexdf'在第0行和第2行中有'student')如何匹配这两个行中的索引并创建列?你知道吗


Tags: 数据samplenotext目标dfdatarandom
2条回答

你也可以这样做

df['bool'] = df[1].eq('student').astype(int)

或者

df['bool'] = df[1].str.match(r'(student)\b').astype(int)

                1  bool
0         student     1
1     sample text     0
2         student     1
3     no students     0
4  sample texting     0
5   random sample     0

如果你想要一个新的数据帧

ndf = df[df[1].eq('student')].copy()

使用正则表达式:

data_df = data_df.assign(regexdf = data_df[1].str.extract(r'(student)\b', expand=False))
data_df['student'] = data_df['regexdf'].notnull().mul(1)
print(data_df)

输出:

                 1  regexdf  student
0         student  student        1
1     sample text      NaN        0
2         student  student        1
3     no students      NaN        0
4  sample texting      NaN        0
5   random sample      NaN        0

编辑

df_out = data_df.join(regexdf, rsuffix='regex')

df_out['pattern'] = df_out['1regex'].notnull().mul(1)

df_out['Count_Pattern'] = df_out['pattern'].cumsum()

print(df_out)

输出:

                1   1regex  pattern  Count_Pattern
0         student  student        1              1
1     sample text      NaN        0              1
2         student  student        1              2
3     no students      NaN        0              2
4  sample texting      NaN        0              2
5   random sample      NaN        0              2

相关问题 更多 >