我正在做这个练习,我有三个文件,但我不知道它打印的东西不应该打印的代码有什么问题

2024-06-06 19:10:31 发布

您现在位置:Python中文网/ 问答频道 /正文

目标是编写一个函数get_mapping(map_file),其唯一参数map_file是包含ID的文件名。因此,在我们的示例中,map_文件将是“mapping/rno.map”、“mapping/mmu.map”和“mapping/hsa.map”之一。这些文件分别有四列、两列和三列。第一列始终对应于我们使用的Ensembl ID。get_映射函数应该返回字典列表。每个字典都应该有一个非Ensembl ID作为键,以及相应的Ensembl ID作为值。列表中字典的数量应该等于列数减去1

import sys

def get_mapping(map_file):
    f = open(map_file, "r")
    # Result is a list of dictionaries.
    mapping_list = []
    # Skip the header on the first line.
    header = f.readline()
    header = header.split()
    #dicts in mapping_list
    col =len(header)-1
    for i in range(col):
        d={}
        mapping_list.append(d)
    for line in f:
        line=line.strip('\n').split('\t');
        for dic in range(len(mapping_list)):
            mapping_list[dic][line[dic+1]]=line[0]
    print(mapping_list)
    f.close()
    return mapping_list
get_mapping('rno.map')
get_mapping('mmu.map')
get_mapping('hsa.map')

输出中不应包含的项:

['ENSRNOP00000058792', '', '', '']
['ENSMUSP00000100465', 'MGI:3645509']
['ENSP00000375105', 'P63162', 'Q6LBS1']

如何删除此项?
这是我应该在输出中得到的:

rno mapping
1302936 ENSRNOP00000015679
1302939 ENSRNOP00000027305
1302944 ENSRNOP00000025813
1302945 ENSRNOP00000010637
1302952 ENSRNOP00000003046
1302957 ENSRNOP00000020169
1302959 ENSRNOP00000006804
...
hsa.map
a0a2g6  ensp00000327895
a0a5b6  ensp00000374923
a0a962  ensp00000315112
a0aul9  ensp00000227459
a0av47  ensp00000260810
a0av56  ensp00000292123
a0avg4  ensp00000305200
a0avk6  ensp00000250024
a0avt1  ensp00000313454
a0ejg6  ensp00000321606
mmu.map
mgi:101761  ensmusp00000072556
mgi:101762  ensmusp00000008542
mgi:101763  ensmusp00000077262
mgi:101764  ensmusp00000099514
mgi:101765  ensmusp00000030814
mgi:101766  ensmusp00000035142
mgi:101769  ensmusp00000044048
mgi:101770  ensmusp00000028907
mgi:101771  ensmusp00000037324
mgi:101772  ensmusp00000028991
mgi:101773  ensmusp00000023618
mgi:101774  ensmusp00000003469
mgi:101775  ensmusp00000097404
mgi:101776  ensmusp00000027740

I get in the output:
rno mapping
    ENSRNOP00000058792
1302936 ENSRNOP00000015679
1302939 ENSRNOP00000027305
1302944 ENSRNOP00000025813
1302945 ENSRNOP00000010637
1302952 ENSRNOP00000003046
1302957 ENSRNOP00000020169
1302959 ENSRNOP00000006804
1302965 ENSRNOP00000012050
1302972 ENSRNOP00000042145
1302973 ENSRNOP00000033541
...

文件示例rno.map

Ensembl_Protein_ID  UniProt/SwissProt_Accession UniProt/TrEMBL_Accession    RGD_ID
ENSRNOP00000000008  P18088  C9E895  2652
ENSRNOP00000000008  P18088  B3VQJ0  2652
ENSRNOP00000000009      D3ZEM1  1310201
ENSRNOP00000000025      B4F7C7  
ENSRNOP00000000029  Q9ES39      620038
ENSRNOP00000000037  Q7TQM3      735156
ENSRNOP00000000052  O70352  Q6IN14  69070
ENSRNOP00000000053  Q9JLM2      68400
ENSRNOP00000000064  P97874      621589
ENSRNOP00000000072  P29419      621377
ENSRNOP00000000074      B2RZ28  1304584

文件示例mmu.map

Ensembl_Protein_ID  MGI_ID
ENSMUSP00000000001  MGI:95773
ENSMUSP00000000028  MGI:1338073
ENSMUSP00000000033  MGI:96434
ENSMUSP00000000049  MGI:88058
ENSMUSP00000000058  MGI:107571
ENSMUSP00000000090  MGI:88474
ENSMUSP00000000094  MGI:2138865
ENSMUSP00000000095  MGI:98494
ENSMUSP00000000122  MGI:97323
ENSMUSP00000000127  MGI:98955
ENSMUSP00000000129  MGI:105917
ENSMUSP00000000137  MGI:1913963
ENSMUSP00000000153  MGI:95767
ENSMUSP00000000161  MGI:1277979
ENSMUSP00000000163  MGI:1919308
ENSMUSP00000000175  MGI:1914175
ENSMUSP00000000186  MGI:1891427
ENSMUSP00000000187  MGI:95520

文件示例hsa.map

Ensembl_Protein_ID  UniProt/SwissProt_Accession UniProt/TrEMBL_Accession
ENSP00000000233 P84085  A4D0Z3
ENSP00000000412 P20645  Q96AH2
ENSP00000000442 P11474  Q96I02
ENSP00000000442 P11474  Q96F89
ENSP00000000442 P11474  Q569H8
ENSP00000001008 Q02790  
ENSP00000002829 Q13275  
ENSP00000003084 P13569  Q9UML7
ENSP00000003084 P13569  Q9UJ19
ENSP00000003084 P13569  Q99989
ENSP00000003084 P13569  Q6KEJ7
ENSP00000003084 P13569  Q6KEJ4

Tags: 文件inidmapgetlinemappinglist