如何将已定义共享子字符串的列表中的字符串移动到新列表中？

regex = re.compile('.*({[a-z]+}).*') matches=[] for element in word_list: m = re.search(regex, element) if m: root = m.group(1) matches.append(root) while counter < len(word_list)/2: randroot = random.choice(matches) #select a random {root} indices = [i for i, e in enumerate(matches) if e == randroot] #get indices of all words with given root for index in indices: #for each index of root-aligned words, appends corresponding word new_list = word_list.pop(index)

2条回答

网友

1楼 · 编辑于 2024-05-29 00:08:06

因此，对于初学者来说，regex实际上并不匹配显示的所有括号中的单词。.*({[a-z]+}).*不匹配：{a==meliorate}我几乎假设等号是打字错误，但如果不是-考虑将{[a-z]+}替换为{.+}之类的东西

除此之外，你的发电机也有一个问题。i for i, e in enumerate(matches) if e == randroot实际上不会检查单词是否与词根匹配，因为您看到的是单词是否是且仅是词根。也就是说，e = {write}因此e != re{write}。相反，您应该对拉取的单词执行regex检查，以查看它们是否包含根，而不是根。你知道吗

网友

2楼 · 编辑于 2024-05-29 00:08:06

另一个答案已经包括正则表达式不会匹配任何带有“=”的字符串，并且比较不会产生输出，而是匹配。。你知道吗

可能最大的问题是，当您从列表中弹出一个元素时，您会更改它的长度，从而更改其中元素的所有索引。这就是为什么你的输出比你预期的更随机。如果您要弹出一个早期元素，然后尝试弹出最后一个元素，那么您也会遇到一个IndexError。你知道吗

我已经调整了代码，使其不依赖于索引。这可能是处理长度不断变化的iterables的最佳方法。你知道吗

#!/usr/bin/env python3
import re
import random

word_list = ['{a==meliorate}>ed>','{a==meliorate}>s>','{a==meliorate}','{anew}','{annex}>ing>','{anvil}>ed>','{anvil}>ing>','{anvil}','<un<{ban}>ed>','<re<{write}']

new_list=[]

regex = re.compile(r".*({[a-z=]+}).*")
matches=[]

for element in word_list:
        m = re.search(regex, element)
        if m:
                root = m.group(1)
                matches.append(root)

target = len(word_list) / 2
while len(new_list) < target:
        randroot = random.choice(matches) # select a random {root}
        found_words = [w for w in word_list if randroot in w] # get all words with given root in them

        if len(found_words) > target - len(new_list):
                continue

        new_list.extend(found_words)
        word_list = [w for w in word_list if w not in new_list] # remove all the words we just added

print(word_list)
print(new_list)

变更说明：我只是在正则表达式中添加了“=”来捕捉“a==melioriate”。我将目标设置为一个变量，因为word_list的长度将发生变化。你知道吗

我现在只检查匹配是否在word_list的字符串中，而不是查找完全匹配的字符串。。这不是一个完全防错的方法，但是看看您的输入数据，我认为在这里使用是安全的。你知道吗

if检查帮助我们确保每个列表的长度都是偶数。例如，我们不会添加“a==melioriate”，它会出现3次。。如果我们还有两个空位到达目标。但是要注意，如果列表不能被平均分割，这将导致一个无限循环。你知道吗

我们用extend将找到的单词添加到new_list。现在我们重建word_list，排除在new_list中找到的任何值。。你知道吗

结果：

['{a==meliorate}>ed>', '{a==meliorate}>s>', '{a==meliorate}', '{anew}', '<un<{ban}>ed>']
['{annex}>ing>', '{anvil}>ed>', '{anvil}>ing>', '{anvil}', '<re<{write}']

相关问题更多 >

编程相关推荐

热门问题

热门文章