Python，使用regex在中间字符上拆分具有重叠匹配的字符串

import re dictionary = ('room', 'door', 'window', 'desk', 'for') regex = re.compile('^(\w{0,2})o(\w{0,2})$') halves = [] for word in dictionary: matches = regex.findall(word) if matches: halves.append(matches)

2条回答

网友

1楼 · 编辑于 2024-04-24 07:22:21

我把这篇文章作为一个答案，主要是为了避免将来有人在这里绊倒而留下答案，而且由于我已经设法达到了预期的行为，尽管可能不是以一种非常Python式的方式，这可能是一个有用的起点从其他人。关于如何改进这个答案的一些注释（例如，使更多的“Python”或仅仅是更有效的将是非常受欢迎的）。在

获取长度在某个范围内的单词和某个位置范围内的字符的所有可能拆分的唯一方法是使用re和新的regex模块使用多个regex。此代码段允许在运行时创建一个适当的正则表达式，它知道单词的长度范围、要查找的字符以及此类字符可能的位置范围。在

dictionary = ('room', 'roam', 'flow', 'door', 'window', 
              'desk', 'for', 'fo', 'foo', 'of', 'sorrow')
char = 'o'
word_len = (3, 6)
char_pos = (2, 3)
regex_str = '(?=^\w{'+str(word_len[0])+','+str(word_len[1])+'}$)(?=\w{'
             +str(char_pos[0]-1)+','+str(char_pos[1]-1)+'}'+char+')'
halves = []
for word in dictionary:
    matches = re.match(regex_str, word)
    if matches:
        matched_halves = []
        for pos in xrange(char_pos[0]-1, char_pos[1]):
            split_regex_str = '(?<=^\w{'+str(pos)+'})'+char
            split_word =re.split(split_regex_str, word)
            if len(split_word) == 2:
                matched_halves.append(split_word)
        halves.append(matched_halves)

输出为：

^{pr2}$

在这一点上，我可能会开始考虑使用正则表达式来查找要拆分的to单词，并以“dumb方式”执行拆分，只是检查范围位置中的字符是否相等char。无论如何，任何评论都是非常感谢的。在

网友

2楼 · 编辑于 2024-04-24 07:22:21

编辑：固定。在

简单的while循环有效吗？在

你想要的是搜索然后循环1次： https://docs.python.org/2/library/re.html

>>> dictionary = ('room', 'door', 'window', 'desk', 'for')
>>> regex = re.compile('(\w{0,2})o(\w{0,2})')
>>> halves = []
>>> for word in dictionary:
>>>     start = 0
>>>     while start < len(word):
>>>         match = regex.search(word, start)
>>>         if match:
>>>             start = match.start() + 1
>>>             halves.append([match.group(1), match.group(2)])
>>>         else:
>>>            # no matches left
>>>            break

>>> print halves
[['ro', 'm'], ['o', 'm'], ['', 'm'], ['do', 'r'], ['o', 'r'], ['', 'r'], ['nd', 'w'], ['d', 'w'], ['', 'w'], ['f', 'r'], ['', 'r']]

相关问题更多 >

编程相关推荐

热门问题

热门文章