如何在特定字符处拆分字符串并构建不同的字符串组合

2024-05-14 21:34:25 发布

您现在位置:Python中文网/ 问答频道 /正文

我想处理文本文件中的一些字符串。我尝试了很多正则表达式模式,但没有一个适合我

someone can tell/figure
a/the squeaky wheel gets the grease/oil
accounts for (someone or something)
that's/there's (something/someone) for you

我需要以下字符串组合:

someone can tell
someone can figure
a squeaky wheel gets the grease
a squeaky wheel gets the oil
the squeaky wheel gets the grease
the squeaky wheel gets the oil
accounts for someone
accounts for something
that's something for you
that's someone for you
there's something for you
there's someone for you

Tags: the字符串youforthatcansomethingwheel
3条回答

编辑:更正括号和“或”,我错过了上一个版本

一个简单的循环解决方案,也适用于多个斜杠(他/她/它/任何东西):

def explode_versions(s):
    match = re.search('^(.*?)(\S+)(?:(?:(?: or )|/)(\S+))+(.*?)$', s)
    
    if match:
        head, *versions, tail = match.groups()
        
        versions[0] = re.sub('^\(', '', versions[0])
        versions[-1] = re.sub('\)$', '', versions[-1])

        return [line for v in versions for line in explode_versions(''.join([head, v, tail]))]
    else:
        return [s]

texts = ["someone can tell/figure",
"a/the squeaky wheel gets the grease/oil",
"accounts for (someone or something)",
"that's/there's (something/someone) for you"]

[explode_versions(text) for text in texts]

结果:

[['someone can tell', 'someone can figure'],
 ['a squeaky wheel gets the grease',
  'a squeaky wheel gets the oil',
  'the squeaky wheel gets the grease',
  'the squeaky wheel gets the oil'],
 ['accounts for someone', 'accounts for something'],
 ["that's something for you",
  "that's someone for you",
  "there's something for you",
  "there's someone for you"]]

这有点棘手,但主要思想是在到达\时复制到目前为止的选项,并跟踪其中的两个选项,请看以下内容:

m_str = ['someone can tell/figure',
'a/the squeaky wheel gets the grease/oil',
'accounts for (someone or something)',
'that\'s/there\'s (something/someone) for you']

lines = [[]]
for line in m_str:
    options = [[]]
    for word in line.split(" "):
        if "/" in word:
            new_options = []
            for option in options:
                new_options.append(option + [word.split("/")[0]])
                new_options.append(option + [word.split("/")[1]])
            options = new_options
            # print(new_options)
                # options = [m_func(options, item) for item in options]

    
        else:
            for option in options:
                option.append(word)
    lines.append(options)
print(lines[1:])

输出:

[[['someone', 'can', 'tell'], ['someone', 'can', 'figure']], [['a', 'squeaky', 'wheel', 'gets', 'the', 'grease'], ['a', 'squeaky', 'wheel', 'gets', 'the', 'oil'], ['the', 'squeaky', 'wheel', 'gets', 'the', 'grease'], ['the', 'squeaky', 'wheel', 'gets', 'the', 'oil']], [['accounts', 'for', '(someone', 'or', 'something)']], [["that's", '(something', 'for', 'you'], ["that's", 'someone)', 'for', 'you'], ["there's", '(something', 'for', 'you'], ["there's", 'someone)', 'for', 'you']]]

您可以使用笛卡尔积:

from itertools import product
import re

s = 'a/the squeaky wheel gets the grease/oil'

lst = [i.split('/') for i in re.split(r'(\w+[\/\w+]+)', s) if i]
# [['a', 'the'], [' squeaky wheel gets the '], ['grease', 'oil']]

[''.join(i) for i in product(*lst)]

输出:

['a squeaky wheel gets the grease',
 'a squeaky wheel gets the oil',
 'the squeaky wheel gets the grease',
 'the squeaky wheel gets the oil']

相关问题 更多 >

    热门问题