从带有循环的表中获取信息（python）

f = open("999A.txt") text_in_file = f.read().strip().split('+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+') f.close() newlist = [] for item in text_in_file: newlist.append(item) matching = [s for s in newlist if ".. image::" in s] for item in newlist: if newlist.index(item) >= newlist.index(matching[0]): newlist.remove(item) num_rows = len(newlist) - 1 def row(i): row_i = newlist[i+1] list_i = list(row_i.strip().split('|')) return list_i[1:17] def column(i): list_i = [] for z in range(num_rows): list_i.append(row(z)[i]) return list_i[1:] for i in range(30): print(row(i)) print("columns:") for i in range(16): print(column(i))

1条回答

网友

1楼 · 发布于 2024-04-23 07:36:40

The table will always have 16 columns

不正确，您只有8个头，所以您将在该行中得到一个索引错误。你知道吗

|  *L1 barcodes*  |  *L2 barcodes*  |  *L3 barcodes*  |  *L4 barcodes*  |  *L5 barcodes*  |  *L6 barcodes*  |  *L7 barcodes*  |  *L8 barcodes*  |
| CTCTCT | 27.66% | GTTTCG | 9.04%  | NNNNNN | 3.67%  | ATTCGG | 7.41%  | GACGAT | 6.90%  | GAACCC | 13.29% | GTAACA | 9.50%  | ATCGCC | 56.24% |

示例代码可以看到这一点

with open("999A.txt") as f:
    for line in f:
        line = line.strip()
        if line.startswith("|"):
            print line

如果希望只获取具有所需列数的行，则需要像这样检查拆分行的长度。你知道吗

data = []
with open("999A.txt") as f:
    for line in f:
        line = line.strip()
        if line.startswith("|"):
            cols = line.split("|")[1:-1] # remove outside empty strings
            cols = list(map(str.strip, cols)) # strip the remaining strings
            if len(cols) == 16 and not all(x == '' for x in cols):
                # keep rows with 16 columns and no empty strings
                data.append(cols)

for row in data:
    # do something
    print(row)

样本输出

['CTCTCT', '27.66%', 'GTTTCG', '9.04%', 'NNNNNN', '3.67%', 'ATTCGG', '7.41%', 'GACGAT', '6.90%', 'GAACCC', '13.29%', 'GTAACA', '9.50%', 'ATCGCC', '56.24%']
['TGTGTG', '27.54%', 'ATTCCT', '5.78%', 'TTCAGA', '3.11%', 'CGAATC', '6.70%', 'ATTCGG', '6.45%', 'TGCTGT', '13.18%', 'TGCTGT', '8.64%', 'GCTATT', '9.98%']
['ACACAC', '22.70%', 'ATGTCA', '4.47%', 'AGGTTT', '3.01%', 'GACGAT', '6.36%', 'CCATTA', '6.37%', 'TTCAGA', '12.19%', 'CCTGAG', '7.82%', 'CCGAGT', '8.79%']
['GAGAGA', '16.18%', 'GTGGCC', '4.06%', 'CCTGAG', '2.71%', 'GCTATT', '6.26%', 'TTGCCG', '6.23%', 'CCTGAG', '11.42%', 'AAGCTC', '7.77%', 'TAATAG', '5.72%']
['', '', 'GNNTNG', '3.96%', 'GAACCC', '2.47%', 'AGTAGC', '6.11%', 'TAGGCT', '6.14%', 'AGGTTT', '11.39%', 'GAACCC', '7.62%', 'CCATTA', '3.70%']
['', '', 'GTGAAA', '3.47%', '', '', 'CCATTA', '6.10%', 'GCCTAA', '6.07%', 'GTAACA', '11.36%', 'CTTAAA', '7.56%', '', '']

您可能还希望对列表中的每一对元素进行分组，以保留最初的8列

看起来是这样的

...
# keep rows with 16 columns and no empty strings
cols_iter = iter(cols)
data.append(list(zip(cols_iter, cols_iter)))

有这样的输出

[('CTCTCT', '27.66%'), ('GTTTCG', '9.04%'), ('NNNNNN', '3.67%'), ('ATTCGG', '7.41%'), ('GACGAT', '6.90%'), ('GAACCC', '13.29%'), ('GTAACA', '9.50%'), ('ATCGCC', '56.24%')]
[('TGTGTG', '27.54%'), ('ATTCCT', '5.78%'), ('TTCAGA', '3.11%'), ('CGAATC', '6.70%'), ('ATTCGG', '6.45%'), ('TGCTGT', '13.18%'), ('TGCTGT', '8.64%'), ('GCTATT', '9.98%')]

在此基础上展开，可以打印每个元素

for row in data:
    # do something
    for seq, percent in row:
        if not '' in {seq, percent}:
            print(seq, percent)

输出

CTCTCT 27.66%
GTTTCG 9.04%
NNNNNN 3.67%
ATTCGG 7.41%
GACGAT 6.90%

相关问题更多 >

编程相关推荐

热门问题

热门文章