用模式阈值识别单词列表中的模式

网友

1楼 · 编辑于 2024-05-16 21:25:07

counter = collections.Counter()
min_length = 2
max_length = len(max(m, key=len))
for length in range(min_length, max_length):
    counter.update(word[:length] for word in m if len(word) >= length)

网友

2楼 · 编辑于 2024-05-16 21:25:07

您可以使用函数accumulate()来生成累积字符串，使用函数islice()来获得最小长度的字符串：

from itertools import accumulate, islice
from collections import Counter

m = ['ABA','ABB', 'ABC','BCA','BCB','BCC','ABBC', 'ABBA', 'ABBC']

c = Counter()
for i in map(accumulate, m):
    c.update(islice(i, 1, None)) # get strings with a minimal length of 2

print(c.most_common(3))
# [('AB', 6), ('ABB', 4), ('BC', 3)]

网友

3楼 · 编辑于 2024-05-16 21:25:07

首先，让我们定义字符串：

>>> m = ['ABA','ABB', 'ABC','BCA','BCB','BCC','ABBC', 'ABBA', 'ABBC']

现在，让我们计算长度为2或3的所有前导字符串：

>>> from collections import Counter
>>> c = Counter([s[:2] for s in m] + [s[:3] for s in m if len(s)>=3])

为了与您的表进行比较，以下是三个最常见的前导字符串：

>>> c.most_common(3)
Out[15]: [('AB', 6), ('ABB', 4), ('BC', 3)]

更新

要包括长度不超过len(max(m, key=len))-1的所有键：

>>> n = len(max(m, key=len))
>>> c = Counter(s[:i] for s in m for i in range(2, min(n, 1+len(s))))

附加试验

为了证明我们可以正确处理较长的字符串，让我们考虑不同的输入：

>>> m = ['ab', 'abc', 'abcdef']
>>> n = len(max(m, key=len))
>>> c = Counter(s[:i] for s in m for i in range(2, min(n, 1+len(s))))
>>> c.most_common()
[('ab', 3), ('abc', 2), ('abcd', 1), ('abcde', 1)]

更新

附加试验

相关问题更多 >

编程相关推荐

热门问题

热门文章

用模式阈值识别单词列表中的模式

更新

附加试验

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >