Python Pandas Regex：在列中搜索带有通配符的字符串并返回匹配项

# dataframe column or series list as keys to search for: dfKeys = pd.DataFrame() dfKeys['SearchFor'] = ['this', 'Something', 'Second', 'Keyword1.*Keyword2', 'Stuff', 'One' ] # col_next_to_SearchFor_col dfKeys['AdjacentCol'] = ['this other string', 'SomeString Else', 'Second String Player', 'Keyword1 Keyword2', 'More String Stuff', 'One More String Example' ] # dataframe column to search in: df1['Description'] = ['Something Here','Second Item 7', 'Something There', 'strng KEYWORD1 moreJARGON 06/0 010 KEYWORD2 andMORE b4END', 'Second Item 7', 'Even More Stuff']] # I've tried: df1['Matched'] = df1['Description'].str.extract('(%s)' % '|'.join(key['searchFor']), flags=re.IGNORECASE, expand=False)

1条回答

网友

1楼 · 发布于 2024-06-16 09:27:04

解决方案

您已经接近解决方案，只需将*更改为.*。正在读取docs：

. (Dot.) In the default mode, this matches any character except a newline. If the DOTALL flag has been specified, this matches any character including a newline.
* Causes the resulting RE to match 0 or more repetitions of the preceding RE, as many repetitions as are possible. ab* will match ‘a’, ‘ab’, or ‘a’ followed by any number of ‘b’s.

在正则表达式中，仅星号*本身就没有任何意义。它与Unix/Windows文件系统中常用的glob运算符*有不同的含义。在

星符号是一个量词（即gready量词），它必须与某种模式相关联（这里.来匹配任何字符）以表示某种意义。在

MCVE

重塑你的MCVE：

import re
import pandas as pd

keys = ['this', 'Something', 'Second', 'Keyword1.*Keyword2', 'Stuff', 'One' ]

df1 = pd.DataFrame()
df1['Description'] = ['Something Here','Second Item 7', 'Something There',
                      'strng KEYWORD1 moreJARGON 06/0 010 KEYWORD2 andMORE b4END',
                      'Second Item 7', 'Even More Stuff']


regstr = '(%s)' % '|'.join(keys)

df1['Matched'] = df1['Description'].str.extract(regstr, flags=re.IGNORECASE, expand=False)

regexp现在是：

^{pr2}$

与缺失的案例相匹配：

                                         Description                                Matched
0                                     Something Here                              Something
1                                      Second Item 7                                 Second
2                                    Something There                              Something
3  strng KEYWORD1 moreJARGON 06/0 010 KEYWORD2 an...  KEYWORD1 moreJARGON 06/0 010 KEYWORD2
4                                      Second Item 7                                 Second
5                                    Even More Stuff                                  Stuff

解决方案

MCVE

相关问题更多 >

编程相关推荐

热门问题

热门文章