如何从列表中删除重复项，并在每个列表项中添加重复项计数？

3条回答

网友
1楼 · 编辑于 2024-05-23 20:25:49

在最初的问题中，您可能（我只是瞥了一眼）使用了set的大小写折叠字符串来查看您是否有一个新的或一个重复的字符串，并在继续时构建一个新字符串列表。你知道吗
您可以用Counter而不是set替换它。但是你需要建立一个列表，然后返回并用计数编辑它。你知道吗
因此，将set/Counter和输出列表替换为OrderedDict，该列表存储每个大小写折叠项的项计数对：
d = collections.OrderedDict() for item in myList: caseless = item.lower() try: d[caseless][1] += 1 except KeyError: d[caseless] = [item, 1]
…然后传递该dict以生成输出列表：
myList = [] for item, count in d.values(): if count > 1: item = '{} ({})'.format(item, count) myList.append(item)
您可以使它更简洁（例如，myList = ['{} ({})'.format(item, count) if count > 1 else item for item, count in d.values()），这也将使它通过一个小的常量因子更快一些。你知道吗
你可以用%代替format来缩短几纳秒的时间，用%d代替%s可能会缩短更多的时间（尽管我认为最后一部分在2.7之前都不再适用）。你知道吗
根据您的平台，a[0] += 1可能比a[1] += 1快或慢。所以两种方法都试一下，如果a[0]更快，就用[count, item]对代替[item, count]。如果您有大量的dup，您可能需要考虑一个带有__slots__的类，这个类的更新速度实际上比列表快一些，但创建速度要慢得多。你知道吗
另外，使用in测试，或者将d.__contains__存储为本地测试，可能比try快，也可能慢，这取决于您期望的重复次数，所以请在实际数据而不是玩具数据集上尝试这三种方法。你知道吗

网友
2楼 · 编辑于 2024-05-23 20:25:49

您还可以尝试使用^{}对象来跟踪计数，并使用它来跟踪所看到的单词，使用无大小写的单词作为引用。然后，在完成对输入列表的迭代后，如果计数大于1，则更新结果列表，使单词计数的形式为%s (%d)。你知道吗
代码：
from collections import Counter words = ["paper", "Plastic", "aluminum", "PAPer", "TIN", " paper", "glass", "tin", "PAPER", "Polypropylene Plastic"] counts = Counter() result = [] for word in words: caseless = word.casefold() if caseless not in counts: result.append(word) counts[caseless] += 1 result = ['%s (%d)' % (w, counts[w.casefold()]) if counts[w.casefold()] > 1 else w for w in result] print(result)
输出：
['paper (3)', 'Plastic', 'aluminum', 'TIN (2)', ' paper', 'glass', 'Polypropylene Plastic']

网友
3楼 · 编辑于 2024-05-23 20:25:49

这是一个使用单个Counter的版本，避免使用另一个set，就像在@RoadRunner的解决方案中一样，在传递时从Counter弹出键。如果有许多副本，这可能会比OrderedDict解决方案稍微慢一些，但会使用更少的内存：

from collections import Counter

words = ["paper", "Plastic", "aluminum", "PAPer", "TIN", " paper", "glass", "tin", "PAPER", "Polypropylene Plastic"]

counter = Counter(w.lower() for w in words)

result = []
for word in words:
    key = word.lower()
    if key in counter:
        count = counter[key]
        if count == 1:
            result.append(word)
        else:
            result.append('{} ({})'.format(word, count))
        counter.pop(key)

注意对于Python>；=3.3，应该使用^{}而不是lower

相关问题更多 >

编程相关推荐

热门问题

热门文章