如何使用“字母/字母表”模式拆分字符串,而不是在同一字符串中拆分“数字/数字”

2024-06-11 17:43:06 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图根据一种模式来区分名字和第二个名字。 但我不想分开,如果这种模式以数字形式出现

输入:

name
john 6/1
park/avenue 34/45
eela 21/22
shaun 21/22
shaun/paul 77/78

代码:


import pandas as pd
import re


import pandas as pd
import re

df1=pd.read_csv('bg.txt',sep='\t')
df1['split?']=df1['name1'].apply(lambda a: 'yes' if  (re.search('[^\d+\/d+]',a) and re.search('[\u0061-\u007A]',a))  else 'no')
df1['name_2'] = df1[df1['split?']=='yes']['name1'].apply (lambda b: b.split('/')[1])
print(df1)

预期产出:

name1                 split?    name2
john 6/1              no        null
park/avenue 34/45     yes       avenue
eela 21/22            no        null
shaun 21/22           no        null
shaun/paul 77/78      yes       paul
mark/tyson            yes       tyson


Tags: nonameimportre模式名字nullyes
3条回答

您可以使用类似[^\W\d_]+/([^\W\d_]+)的模式匹配1+Unicode字母,然后/,然后捕获组1中的1+Unicode字母。可能,将其与单词边界一起使用,以仅匹配整个单词:

df['name2'] = df['name'].str.extract(r'\b[^\W\d_]+/([^\W\d_]+)\b', expand=False)
df['split?'] = df['name2'].notna().map({False:'no', True:'yes'})

要使用null而不是NaN,可以添加df['name2'] = df['name2'].fillna('null')

Python演示:

import pandas as pd

cols = {'name':['john 6/1','park/avenue 34/45','eela 21/22','shaun 21/22','shaun/paul 77/78','mark/tyson']}
df = pd.DataFrame(cols)
df['name2'] = df['name'].str.extract(r'[^\W\d_]+/([^\W\d_]+)', expand=False)
df['split?'] = df['name2'].notna().map({False:'no', True:'yes'})

输出:

>>> df
                name   name2 split?
0           john 6/1     NaN     no
1  park/avenue 34/45  avenue    yes
2         eela 21/22     NaN     no
3        shaun 21/22     NaN     no
4   shaun/paul 77/78    paul    yes
5         mark/tyson   tyson    yes

可以将^{}与以下模式一起使用:

df['name2'] = df.name.str.extract(r'/(\w+)\s\d+/')
df['split'] = df.name2.notna().map({False:'No', True:'Yes'})

print(df)

                name   name2 split
0           john 6/1     NaN    No
1  park/avenue 34/45  avenue   Yes
2         eela 21/22     NaN    No
3        shaun 21/22     NaN    No
4   shaun/paul 77/78    paul   Yes

使用str.extract

Ex:

df = pd.DataFrame({"Col": ['john 6/1', 'park/avenue 34/45', 'eela 21/22', 'shaun 21/22', 'shaun/paul 77/78']})
df['New'] = df['Col'].str.extract(r"\/([A-Za-z]+)")
print(df)

输出:

                 Col     New
0           john 6/1     NaN
1  park/avenue 34/45  avenue
2         eela 21/22     NaN
3        shaun 21/22     NaN
4   shaun/paul 77/78    paul

相关问题 更多 >