如何按一列对行进行分组?

2024-05-20 23:25:32 发布

您现在位置:Python中文网/ 问答频道 /正文

我想为这样的输入写一个函数

  1405684432,        d8:c7:c8:5e:7c:2d,           SUTD_GLAB,   72

  1405684432,        d8:c7:c8:5e:7c:2c,            SUTD_BOT,   72

  1405684432,        d8:c7:c8:5e:7c:2b,        SUTD_Student,   72

  1405684432,        d8:c7:c8:5e:7c:2a,          SUTD_Staff,   72

  1405684433,        d8:c7:c8:5e:7c:29,           SUTD_ILP2,   71

  1405684433,        d8:c7:c8:5e:7d:eb,        SUTD_Student,   57

  1405684433,        d8:c7:c8:5e:7d:ea,          SUTD_Staff,   57

输出将给我两个列表或文件按第一列分组,这意味着如果第一列中的数字相同,它将分组为一个列表。结果应该是这样的:

列出一个:

  1405684432,        d8:c7:c8:5e:7c:2d,           SUTD_GLAB,   72

  1405684432,        d8:c7:c8:5e:7c:2c,            SUTD_BOT,   72

  1405684432,        d8:c7:c8:5e:7c:2b,        SUTD_Student,   72

  1405684432,        d8:c7:c8:5e:7c:2a,          SUTD_Staff,   72

第二点:

  1405684433,        d8:c7:c8:5e:7c:29,           SUTD_ILP2,   71

  1405684433,        d8:c7:c8:5e:7d:eb,        SUTD_Student,   57

  1405684433,        d8:c7:c8:5e:7d:ea,          SUTD_Staff,   57

我不知道该用哪种方法。你知道吗


Tags: 文件函数列表bot数字studentstaffea
3条回答
  1. 将输入读取为CSV文件
  2. 使用第一列作为字典的键
  3. 出字典

Python代码:

import csv

groups = {}

with open("data.csv") as data:
    reader = csv.reader(data)
    for row in reader:
        if len(row) > 0:
            col1 = row[0].strip()
            group = groups.get(col1, [])
            group.append(row)
            groups[col1] = group

for key in groups:
    print("=== {0} ===".format(key))
    print("\n".join(",".join(row) for row in groups[key]))

输出:

=== 1405684433 ===
1405684433,        d8:c7:c8:5e:7c:29,           SUTD_ILP2,   71
1405684433,        d8:c7:c8:5e:7d:eb,        SUTD_Student,   57
1405684433,        d8:c7:c8:5e:7d:ea,          SUTD_Staff,   57
=== 1405684432 ===
1405684432,        d8:c7:c8:5e:7c:2d,           SUTD_GLAB,   72
1405684432,        d8:c7:c8:5e:7c:2c,            SUTD_BOT,   72
1405684432,        d8:c7:c8:5e:7c:2b,        SUTD_Student,   72
1405684432,        d8:c7:c8:5e:7c:2a,          SUTD_Staff,   72

我会选择用字典来记录第一列。一种解决方案是使用以下方法:

def split_on_first_column(data):
    result = dict()
    for line in data:
        l = line.split(',')
        if not l[0] in result:
            result[l[0]] = [line]
        else:
            result[l[0]].append(line)

    return result.values()

在Python2中给出了列表列表,在Python3中给出了列表迭代器。你知道吗

请注意,这些行存储为完整的字符串,而不是进一步拆分为列表。你知道吗

您可以使用itertools.groupby()。(假设输入按该列排序。)

示例:

import itertools

data = """\
  1405684432,        d8:c7:c8:5e:7c:2d,           SUTD_GLAB,   72
  1405684432,        d8:c7:c8:5e:7c:2c,            SUTD_BOT,   72
  1405684432,        d8:c7:c8:5e:7c:2b,        SUTD_Student,   72
  1405684432,        d8:c7:c8:5e:7c:2a,          SUTD_Staff,   72
  1405684433,        d8:c7:c8:5e:7c:29,           SUTD_ILP2,   71
  1405684433,        d8:c7:c8:5e:7d:eb,        SUTD_Student,   57
  1405684433,        d8:c7:c8:5e:7d:ea,          SUTD_Staff,   57
"""

data = data.splitlines()
keyfunc = lambda x: x.split(',')[0]
#data.sort(key=keyfunc) # if input is not sorted by first column

for k,l in itertools.groupby(data, key=keyfunc):
    print "group:", k
    for x in l:
        print x

输出:

group:   1405684432
  1405684432,        d8:c7:c8:5e:7c:2d,           SUTD_GLAB,   72
  1405684432,        d8:c7:c8:5e:7c:2c,            SUTD_BOT,   72
  1405684432,        d8:c7:c8:5e:7c:2b,        SUTD_Student,   72
  1405684432,        d8:c7:c8:5e:7c:2a,          SUTD_Staff,   72
group:   1405684433
  1405684433,        d8:c7:c8:5e:7c:29,           SUTD_ILP2,   71
  1405684433,        d8:c7:c8:5e:7d:eb,        SUTD_Student,   57
  1405684433,        d8:c7:c8:5e:7d:ea,          SUTD_Staff,   57

供参考:

相关问题 更多 >