在Python中,如何将一个列表拆分为包含重复值的子列表?

2024-06-16 10:50:06 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个每天特定时间的潮汐信息列表。有点像这样:

tideData = [
['Thursday 4 January',11.58,0.38],
['Thursday 4 January',16.95,0.73],
['Friday 5 January',6.48,0.83],
['Friday 5 January',12.42,0.33],
['Saturday 6 January',0.5,0.02],
['Saturday 6 January',7.18,0.85],
...
['Friday 2 February',23.52,0.04]
]

我想将此列表拆分为包含相同日期的子列表。在上述情况下,列表将变为:

tideData = [
[['Thursday 4 January',11.58,0.38],
['Thursday 4 January',16.95,0.73]],
[['Friday 5 January',6.48,0.83],
['Friday 5 January',12.42,0.33],
['Friday 5 January',17.92,0.75]],
[['Saturday 6 January',0.5,0.02],
['Saturday 6 January',7.18,0.85]],
...
['Friday 2 February',23.52,0.04]]
]

现在,如果每次约会的次数相等,这就不是问题了。然而,日期有时出现两次,有时出现三次。因此,我希望能够根据重复日期将它们分类为子列表。我该怎么办?你知道吗


Tags: 信息列表时间分类次数约会潮汐friday
3条回答

我想您应该使用groupby包中的itertools

from itertools import groupby

tideData = [
['Thursday 4 January',11.58,0.38],
['Thursday 4 January',16.95,0.73],
['Friday 5 January',6.48,0.83],
['Friday 5 January',12.42,0.33],
['Saturday 6 January',0.5,0.02],
['Saturday 6 January',7.18,0.85],
['Friday 2 February',23.52,0.04]
]

如果未对数据进行排序,则可以使用:

tideData = sorted(tideData, key=lambda x: x[0])

在使用以下工具之前:

[list(g) for _,g in groupby(tideData, key=lambda x: x[0])]
# returns:
[[['Thursday 4 January', 11.58, 0.38], ['Thursday 4 January', 16.95, 0.73]],
 [['Friday 5 January', 6.48, 0.83], ['Friday 5 January', 12.42, 0.33]],
 [['Saturday 6 January', 0.5, 0.02], ['Saturday 6 January', 7.18, 0.85]],
 [['Friday 2 February', 23.52, 0.04]]]

你可以用collections.defaultdict来表示一个O(n)解。你知道吗

在Python3.7中,您将获得额外的好处,即值的顺序将与输入中的顺序相匹配。这在python3.6中可以工作,但被认为是一个实现细节。你知道吗

from collections import defaultdict

d = defaultdict(list)

for item in tideData:
    d[item[0]].append(item)

res = list(d.values())

结果:

[[['Thursday 4 January', 11.58, 0.38], ['Thursday 4 January', 16.95, 0.73]],
 [['Friday 5 January', 6.48, 0.83], ['Friday 5 January', 12.42, 0.33]],
 [['Saturday 6 January', 0.5, 0.02], ['Saturday 6 January', 7.18, 0.85]],
 [['Friday 2 February', 23.52, 0.04]]]

对于那些对O(n)和O(n logn)解决方案之间的性能差异感兴趣的人:

from collections import defaultdict
from itertools import groupby

tideData = tideData * 10000

def jp(tideData):
    d = defaultdict(list)
    for item in tideData:
        d[item[0]].append(item)
    return list(d.values())

def grp(tideData):    
    return [list(g) for _,g in groupby(tideData, key=lambda x: x[0])]

%timeit jp(tideData)   # 12.2 ms per loop
%timeit grp(tideData)  # 33.1 ms per loop

这里有一个简单的方法,没有任何导入:

groub_by={}
for i,j in enumerate(tideData):
    if j[0] not in groub_by:
        groub_by[j[0]]=[j]
    else:
        groub_by[j[0]].append(j)
print(groub_by.values())

输出:

[[['Thursday 4 January', 11.58, 0.38], ['Thursday 4 January', 16.95, 0.73]], [['Saturday 6 January', 0.5, 0.02], ['Saturday 6 January', 7.18, 0.85]], [['Friday 5 January', 6.48, 0.83], ['Friday 5 January', 12.42, 0.33]], [['Friday 2 February', 23.52, 0.04]]]

相关问题 更多 >