如何在Pandas中建立名字检测器

2024-05-16 19:30:36 发布

您现在位置:Python中文网/ 问答频道 /正文

这是我的数据集

Id.   Text
1     Dear Mr. Alpha Terra, your food is delivered
2     Dear Mrs. Betta Irina Viruva, your drink is delivered

我想要的是检测在Mr,之后或Mrs,但在,之前的单词。所以,我可以得到名字,这就是我想要的

Id.   Text                                                       Name
1     Dear Mr. Alpha Terra, your food is delivered               Alpha Terra 
2     Dear Mrs. Betta Irina Viruva, your drink is delivered      Betta Irina Viruva

Tags: textalphaidyourfoodisterramr
3条回答

一种选择是使用以下模式进行匹配:

.*Mrs?\.\s+([^,]+).*

这将捕获Mr.Mrs.之后的所有逗号,包括但不包括后面的第一个逗号

line = "Dear Mrs. Betta Irina Viruva, your drink is delivered"
matches = re.match(r'.*Mrs?\.\s+([^,]+).*', line, re.M|re.I)

if matches:
    print "Name: ", matches.group(1)
else:
    print "No match!!"

Demo

使用^{}

df['Name'] = df['Text'].str.extract(r'Mrs?\.\s+(.*?),', expand=False)
print (df)
   Id.                                               Text                Name
0    1       Dear Mr. Alpha Terra, your food is delivered         Alpha Terra
1    2  Dear Mrs. Betta Irina Viruva, your drink is de...  Betta Irina Viruva

试试这个:

In [134]: df.Text.str.split('.',expand=True)[1].str.split(',',expand=True)[0]
Out[134]: 
0            Alpha Terra
1     Betta Irina Viruva
Name: 0, dtype: object

相关问题 更多 >