在Python中处理列表的列表并选择子列表

0 投票

4 回答

3417 浏览

提问于 2025-04-18 12:50

注意：我正在使用Python来读取这个文件。

我现在有一个数据文件，内容大致是这样的：

1 0.1803 233.650000 101.52010 37.95730 96.41869
0.462300 1.425000e+12 1.811000e+12 1.710841e+10
0.456300 1.811000e+12 1.811000e+12 1.711282e+10
0.450300 9.443000e+11 9.443000e+11 9.842220e+09
0.444300 7.089000e+11 7.089000e+11 6.764462e+09

0 0.2523 462.060000 96.47176 48.58004 84.13097
0.456300 1.325000e+13 1.325000e+13 7.735244e+10
0.450300 1.283000e+13 1.283000e+13 7.684167e+10
0.444300 1.182000e+13 1.182000e+13 7.571757e+10
0.438300 1.002000e+13 1.002000e+13 7.352358e+10
0.432300 8.971000e+12 8.971000e+12 7.196254e+10

1 0.0000 74.230000 81.10059 46.28531 95.17891
0.342300 2.862000e+10 3.803000e+10 9.795136e+06

0 0.9493 776.060000 98.65339 41.54604 94.64194
1.000300 1.467000e+14 1.674000e+14 1.279873e+11
0.997300 1.467000e+14 1.674000e+14 1.280501e+11
0.994300 1.476000e+14 1.674000e+14 1.281122e+11

简单来说，这个数据是一个很大的列表，里面又包含了很多小列表，每个小列表之间用空行分隔。每个小列表的第一行有6列，后面的行都有4列。每个小列表的长度是不一样的。我想要选择那些符合特定条件的小列表。比如，我只想选择那些第一行第一个元素为0的小列表，这样在我上面给出的例子中，只会选到第二个和第四个小列表。

我想到的解决办法是：先选出每个小列表的第一行，做一个单独的数组来存这些值。然后我可以用where()函数找出第一个元素为0的索引。接着，我就可以选择和这些索引对应的小列表。

问题是，我不知道怎么处理数据中的空行。我不太清楚如何索引那些被空行分开的列表，也不知道怎么只选择那些在空行后面的数据行。有没有人能给我一些建议，或者有没有其他的解决方案？提前谢谢大家。

条件筛选数据结构数据分析数据索引列表处理数组操作空行处理子列表选择

4 个回答

我建议把每个列表的列表转换成真正的列表，这样在Python中处理起来会简单很多。这样你可以通过遍历这些列表来处理各种情况，而不是直接处理文件。

lists=[] #this would be your lists of lists of lists (redundant enough for you?)
f=open("whateverfilename.dat")
j=[]
for line in f:
    if line=="\n": #if the line is blank
        lists.append(j) #add the list of lists to your list of lists of lists
        j=[] #clear j for next batch of data
    else:
        a=line.split() #split each piece of data into a list
        j.append(a) #add it to the list of lists you are currently on

这样你就可以像处理普通列表一样遍历数据，我觉得这比直接遍历文件要简单得多。

回答于 2025-04-18 由 Python大师

分享举报

你需要读取这个文件，并根据你所在的位置处理不同的情况。

下面是一些带注释的代码，供你参考：

function read_data(f):
    first, rest = None, [] # Reset data
    for line in f: # Run over lines in the file
      if not line.strip(): # In case of empty line (or only whitespace)
        yield first, rest # Yield the currently held values
        first, rest = None, [] # Reset data
        continue # Skip this line
      if first is None: # If we're at the beginning of a new set
        first = [float(x) for x in line.split()] # Read it into "first"
        continue # And go on
      # Otherwise, we're inside a list, so read that into rest
      rest.append([float(x) for x in line.split()])
    # The file is done, but since there was no empty line,
    # we didn't yield the last entry, so we yield it now
    yield first, rest

回答于 2025-04-18 由 Python大师

分享举报

列表推导式让这个过程变得非常简单：

>>> s = open(yourfile).read()
>>> data = [[map(float, row) for row in map(str.split, sublist)] for sublist in (group.split('\n') for group in s.split('\n\n'))]
>>> result = [group for group in data if group[0][0] == 0]

首先，我们要把这些数据解析成可以方便访问的格式。

我觉得用一个包含多个列表的列表是个不错的选择，像下面这样就很理想：

[[[1.0, 0.1803, 233.65, 101.5201, 37.9573, 96.41869],
  [0.4623, 1425000000000.0, 1811000000000.0, 17108410000.0],
  [0.4563, 1811000000000.0, 1811000000000.0, 17112820000.0],
  [0.4503, 944300000000.0, 944300000000.0, 9842220000.0],
  [0.4443, 708900000000.0, 708900000000.0, 6764462000.0]],
 [[0.0, 0.2523, 462.06, 96.47176, 48.58004, 84.13097],
  [0.4563, 13250000000000.0, 13250000000000.0, 77352440000.0],
  [0.4503, 12830000000000.0, 12830000000000.0, 76841670000.0],
  [0.4443, 11820000000000.0, 11820000000000.0, 75717570000.0],
  [0.4383, 10020000000000.0, 10020000000000.0, 73523580000.0],
  [0.4323, 8971000000000.0, 8971000000000.0, 71962540000.0]],
 [[1.0, 0.0, 74.23, 81.10059, 46.28531, 95.17891],
  [0.3423, 28620000000.0, 38030000000.0, 9795136.0]],
 [[0.0, 0.9493, 776.06, 98.65339, 41.54604, 94.64194],
  [1.0003, 146700000000000.0, 167400000000000.0, 127987300000.0],
  [0.9973, 146700000000000.0, 167400000000000.0, 128050100000.0],
  [0.9943, 147600000000000.0, 167400000000000.0, 128112200000.0]]]

为了实现这个，我们可以使用列表推导式：

>>> s = open(yourfile).read()
>>> data = [[map(float, row) for row in map(str.split, sublist)] for sublist in (group.split('\n') for group in s.split('\n\n'))]

从右到左看这个列表推导式会更容易理解：

首先，我们用 split('\n\n') 把输入按连续的换行符分开，这样就得到了一个包含 group 的列表。这也解决了你提到的“空行”问题。
接着，对于每个 group，我们再用 '\n' 来分割，得到一个包含 sublist 的列表。
然后，对于每个 sublist 中的每一行 row，我们：
1. 用 map(str.split, sublist) 按空格分割，得到一个包含 str 的列表。
2. 接着，我们再通过 map(float, row) 把这个列表转换成一个包含 float 的列表。

现在，来看看如何根据特定条件选择数据...

同样，我们可以使用列表推导式。比如，选择那些第一行第一个元素是 0 的组：

>>> result = [group for group in data if group[0][0] == 0]

这样就会得到：

[[[0.0, 0.2523, 462.06, 96.47176, 48.58004, 84.13097],
  [0.4563, 13250000000000.0, 13250000000000.0, 77352440000.0],
  [0.4503, 12830000000000.0, 12830000000000.0, 76841670000.0],
  [0.4443, 11820000000000.0, 11820000000000.0, 75717570000.0],
  [0.4383, 10020000000000.0, 10020000000000.0, 73523580000.0],
  [0.4323, 8971000000000.0, 8971000000000.0, 71962540000.0]],
 [[0.0, 0.9493, 776.06, 98.65339, 41.54604, 94.64194],
  [1.0003, 146700000000000.0, 167400000000000.0, 127987300000.0],
  [0.9973, 146700000000000.0, 167400000000000.0, 128050100000.0],
  [0.9943, 147600000000000.0, 167400000000000.0, 128112200000.0]]]

这一切都是用一些非常强大的 Python 内置功能完成的，而且没有导入任何模块！

回答于 2025-04-18 由 Python大师

分享举报

按行数大于0分组

假设你想得到一个列表的列表：

>>> import csv
>>> from itertools import groupby
>>> grouper = lambda rec: len(rec) > 0
>>> with open("data.txt") as f:
...     reader = csv.reader(f, delimiter=" ")
...     res = [list(items) for group, items in groupby(reader, key=grouper) if group]
...
>>> res
[[['1', '0.1803', '233.650000', '101.52010', '37.95730', '96.41869'],
  ['0.462300', '1.425000e+12', '1.811000e+12', '1.710841e+10'],
  ['0.456300', '1.811000e+12', '1.811000e+12', '1.711282e+10'],
  ['0.450300', '9.443000e+11', '9.443000e+11', '9.842220e+09'],
  ['0.444300', '7.089000e+11', '7.089000e+11', '6.764462e+09']],
 [['0', '0.2523', '462.060000', '96.47176', '48.58004', '84.13097'],
  ['0.456300', '1.325000e+13', '1.325000e+13', '7.735244e+10'],
  ['0.450300', '1.283000e+13', '1.283000e+13', '7.684167e+10'],
  ['0.444300', '1.182000e+13', '1.182000e+13', '7.571757e+10'],
  ['0.438300', '1.002000e+13', '1.002000e+13', '7.352358e+10'],
  ['0.432300', '8.971000e+12', '8.971000e+12', '7.196254e+10']],
 [['1', '0.0000', '74.230000', '81.10059', '46.28531', '95.17891'],
  ['0.342300', '2.862000e+10', '3.803000e+10', '9.795136e+06']],
 [['0', '0.9493', '776.060000', '98.65339', '41.54604', '94.64194'],
  ['1.000300', '1.467000e+14', '1.674000e+14', '1.279873e+11'],
  ['0.997300', '1.467000e+14', '1.674000e+14', '1.280501e+11'],
  ['0.994300', '1.476000e+14', '1.674000e+14', '1.281122e+11']]]

这个叫做 grouper 的函数接收一个记录作为参数（csv.reader 提供了一系列数字），如果这个列表不为空，它会返回 True，如果没有任何项目，它会返回 False。

如果你根据这个值进行分组，你会得到由空行分隔的组。

剩下的唯一步骤就是去掉那些因为空行而产生的小组。列表推导式允许通过最后的 if <condition> 语句进行过滤。在这里，我们可以重用 groupby 提供的 True 或 False。

groupby 来自 itertools，它的第一个参数是一个可迭代对象，key 参数定义了一个可调用对象，用于计算特定项目的分组值。一旦分组值发生变化，就会生成一个新组。

groupby 生成一个元组，第一个项目是决定组的值（True 或 False），第二个项目是包含该组内所有项目的可迭代对象。

将字符串转换为浮点数

如果你想把数字读作浮点数，我们可以定义一个叫做 floater 的函数，它接受 res 中的一个项目作为参数，并对所有子列表应用 float：

def floater(lstlst):
    return [map(float, items) for items in lstlst]

那么解决方案看起来会是：

>>> import csv
>>> from itertools import groupby
>>> grouper = lambda rec: len(rec) > 0
>>> with open("data.txt") as f:
...     reader = csv.reader(f, delimiter=" ")
...     res = [floater(items) for group, items in groupby(reader, key=grouper) if group]
>>> res
[[[1.0, 0.1803, 233.65, 101.5201, 37.9573, 96.41869],
  [0.4623, 1425000000000.0, 1811000000000.0, 17108410000.0],
  [0.4563, 1811000000000.0, 1811000000000.0, 17112820000.0],
  [0.4503, 944300000000.0, 944300000000.0, 9842220000.0],
  [0.4443, 708900000000.0, 708900000000.0, 6764462000.0]],
 [[0.0, 0.2523, 462.06, 96.47176, 48.58004, 84.13097],
  [0.4563, 13250000000000.0, 13250000000000.0, 77352440000.0],
  [0.4503, 12830000000000.0, 12830000000000.0, 76841670000.0],
  [0.4443, 11820000000000.0, 11820000000000.0, 75717570000.0],
  [0.4383, 10020000000000.0, 10020000000000.0, 73523580000.0],
  [0.4323, 8971000000000.0, 8971000000000.0, 71962540000.0]],
 [[1.0, 0.0, 74.23, 81.10059, 46.28531, 95.17891],
  [0.3423, 28620000000.0, 38030000000.0, 9795136.0]],
 [[0.0, 0.9493, 776.06, 98.65339, 41.54604, 94.64194],
  [1.0003, 146700000000000.0, 167400000000000.0, 127987300000.0],
  [0.9973, 146700000000000.0, 167400000000000.0, 128050100000.0],
  [0.9943, 147600000000000.0, 167400000000000.0, 128112200000.0]]]

回答于 2025-04-18 由 Python大师

分享举报

在Python中处理列表的列表并选择子列表

4 个回答

按行数大于0分组

将字符串转换为浮点数

撰写回答