如何替换逗号分隔的名称列表中的逗号

1. FirstName1 LastName1, FirstName2 LastName2, FirstName3 LastName3 Description with FirstName1 LastName1, FirstName2 LastName2, FirstName3 LastName3 2. FirstName3 LastName3, FirstName4 LastName4 Description with FirstName3 and FirstName4 LastName4. 3. FirstName3 LastName3, FirstName6 LastName6 Description with FirstName3 and FirstName6. 

1. [[FirstName1 LastName1]], [[FirstName2 LastName2]], [[FirstName3 LastName3]] Description with FirstName1 LastName1, FirstName2 LastName2, FirstName3 LastName3 2. [[FirstName3 LastName3]], [[FirstName4 LastName4]] Description with FirstName3 and FirstName4 LastName4. 3. [[FirstName3 LastName3]], [[FirstName6 LastName6]] Description with FirstName3 and FirstName6. 

3条回答

网友
1楼 · 编辑于 2024-04-25 13:21:17

也许，一些类似的表达
(?<=\.\s|,\s)([^,\r\n]+)\s*(?= |,)
以及替换
[[\1]]
可能也是一种选择。你知道吗
测试
import re regex = r"(?<=\.\s|,\s)([^,\r\n]+)\s*(?= |,)" test_str = ("1. John Smith1, John Smith2, John Smith3, etc. \n" "12. John Smith1, John Smith2, John Smith3, etc. ") subst = "[[\\1]]" print(re.sub(regex, subst, test_str))
输出
1. [[John Smith1]], [[John Smith2]], [[John Smith3]], [[etc.]] 12. [[John Smith1]], [[John Smith2]], [[John Smith3]], [[etc.]] 
如果您希望简化/修改/探索表达式，在regex101.com的右上面板中已经对其进行了解释。如果您愿意，还可以在this link中查看它如何与一些示例输入匹配。你知道吗

网友
2楼 · 编辑于 2024-04-25 13:21:17

你可以这样做
import re st = "1. John Smith1, John Smith2, John Smith3, etc. " re.findall(r"(?:\d\. )?(.*?)(?:, | )", st)

网友
3楼 · 编辑于 2024-04-25 13:21:17

像往常一样，有两种方法可以做到这一点，但是仅仅用regex替换可能是不够的。我有两个选择：

正则表达式+字符串操作

在原始正则表达式的基础上进行扩展，可以使用此正则表达式更好地捕获并跳过第一个数字/点/空格组：

import re
st = '1. John Smith1, John Smith2, John Smith3, etc.<br>'
re1 = r"(\d\.\s)*(.+?)(?:, |(<br>)$)"
new_st = re.sub(re1, r"\1[[\2]], \3", st)
print(new_st)

这给了我们一个价值：

new_st = '1. [[John Smith1]], [[John Smith2]], [[John Smith3]], [[etc.]], <br>'

注意结尾的最后一个逗号。我们可以用以下方法移除这个：

new_st = ''.join(new_st.rsplit(", ", 1))

这给了我们：

'1. [[John Smith1]], [[John Smith2]], [[John Smith3]], [[etc.]]<br>'

所以总的来说你应该：

import re
st = '1. John Smith1, John Smith2, John Smith3, etc.<br>'
re1 = r"(\d\.\s)*(.+?)(?:, |(<br>)$)"
new_st = re.sub(re1, r"\1[[\2]], \3", st)  # notice I do capture the first group
new_st = ''.join(new_st.rsplit(", ", 1))

提取核心，然后使用split/join

这也使用正则表达式，但只提取字符串的核心。然后使用连接/拆分的组合来实现所需的结果：

import re
st = '1. John Smith1, John Smith2, John Smith3, etc.<br>'
re2 = r"(\d+\.\s+)(.+)(<br>)$"
sections = re.findall(re3, st)

# just to make it clearer i'll split the sections
the_number, the_core, the_end = sections[0]

# rework the core
the_core = ']], [['.join(the_core.split(','))

# glue all the pieces together adding what's missing
new_st = the_number + '[[' + the_core + ']]' + the_end

结果是：

'1. [[John Smith1]], [[ John Smith2]], [[ John Smith3]], [[ etc.]]<br>'

测试

输出

正则表达式+字符串操作

提取核心，然后使用split/join

相关问题更多 >

编程相关推荐

热门问题

热门文章