如何在元音或空格后用逗号分隔字符串并附加到数组中？

3条回答

网友

1楼 · 编辑于 2024-05-16 20:54:19

虽然被接受的答案非常好，但我认为我应该使用reregex和列表理解，因为我相信它提供了一个更容易理解的解决方案

import re

def to_sylables(text):
    match_pattern = r"([aeiou ])"
    replace_pattern = r"\1\t"  ## replace the match with itself and a tab
    return [
        x for x
        in re.sub(match_pattern, replace_pattern, text).split("\t")
        if x.strip()
    ]

text = "a mambo jambo"
print(f"\"{text}\" ==> {to_sylables(text)}")

这将为您提供：

"a mambo jambo" ==> ['a', 'ma', 'mbo', 'ja', 'mbo']

@theherk对这一点的各种答案进行了计时，我想看看这是如何做到的可能会很有趣。我这样做的主要动机是，他们报告说我的答案是最慢的，而且差距很大。虽然我没想到会与regex一起“赢得”一场速度竞赛，但我很惊讶这会导致如此大的减速

（对我来说）好消息是，虽然我的答案仍然是最慢的，但它并不像报道的那么慢。我相信@theherk可能包括了import re的时间成本，这可能公平，也可能不公平

如果要运行各种答案的timeit，请尝试：

import timeit

setup_jonsg = '''
import re
text = "a mambo jambo"
def to_sylables(text):
    match_pattern = r"([aeiou ])"
    replace_pattern = r"\1\t"
    return [
        x for x
        in re.sub(match_pattern, replace_pattern, text).split("\t")
        if x.strip()
    ]
'''

setup_trcka = '''
text = "a mambo jambo"
def to_sylables(text):
    vowels = ["a", "e", "i", "o", "u"]
    out_text = text
    for vowel in vowels:
        out_text = f'{vowel}|'.join(out_text.split(vowel))
    out_text = out_text if out_text[-1] != '|' else out_text[:-1]
    return out_text
'''

setup_theherk = '''
text = "a mambo jambo"
def to_sylables(text):
    vowels = ["a", "e", "i", "o", "u"]
    cur = ""
    new = []
    for c in text:
        if c == " ":
            if cur != "":
                new.append(cur)
            cur = ""
        elif c in vowels:
            cur += c
            new.append(cur)
            cur = ""
        else:
            cur += c
    return new
'''

print(f"jonsg: {timeit.timeit('to_sylables(text)', setup=setup_jonsg, number=1_000_000):.2f}")
print(f"trcka: {timeit.timeit('to_sylables(text)', setup=setup_trcka, number=1_000_000):.2f}")
print(f"theherk: {timeit.timeit('to_sylables(text)', setup=setup_theherk, number=1_000_000):.2f}")

对我来说，该报告的结果如下：

jonsg: 2.57
trcka: 1.33
theherk: 2.05

因此，我仍然是最慢的，但我认为，在优化性能增益可以忽略不计的部分之前，应该实现最容易理解的解决方案（可能不是我的）

网友

2楼 · 编辑于 2024-05-16 20:54:19

虽然还远未达到最佳状态，但我希望它能有所帮助：

text = "a mambo jambo"
vowels = ["a", "e", "i", "o", "u"]

out_text = text
for vowel in vowels:
    out_text = f'{vowel}|'.join(out_text.split(vowel))

out_text = out_text if out_text[-1] != '|' else out_text[:-1]
print(out_text.replace(" ", '').split('|'))

输出：

['a', 'ma', 'mbo', 'ja', 'mbo']

如果有效，别忘了接受答案

网友

3楼 · 编辑于 2024-05-16 20:54:19

Peter Trcka提供了一个有趣的答案，但这里有另一种方法。这不一定更好，但可能更清楚

s = "a mambo jambo"
vowels = ["a", "e", "i", "o", "u"]

cur = ""
new = []
for c in s:
    if c == " ":
        if cur != "":
            new.append(cur)
        cur = ""
    elif c in vowels:
        cur += c
        new.append(cur)
        cur = ""
    else:
        cur += c

print(new)

这是另一种方法。它稍微慢一点

s = "a mambo jambo"
vowels = ["a", "e", "i", "o", "u"]

new = []
i = 0
for c in s:
    if len(new) == i:
        if c == " ":
            continue
        new.append("")
    new[i] += c if c != " " else ""
    if c in vowels + [" "]:
        i += 1

print(new)

对于后代，我将这两个Peter Trcka's answer和JonSG's answer转换为函数，并通过timeit运行它们。结果是：

Peter Trcka's answer:1.96（更快）
我的第一个方法：2.15
我的第二种方法：4.24
JonSG's answer:8.9（速度最慢，经常是正则表达式包含反向引用的问题）

相关问题更多 >

编程相关推荐

热门问题

热门文章