目标是编写一个函数get_mapping(map_file),其唯一参数map_file是包含ID的文件名。因此,在我们的示例中,map_文件将是“mapping/rno.map”、“mapping/mmu.map”和“mapping/hsa.map”之一。这些文件分别有四列、两列和三列。第一列始终对应于我们使用的Ensembl ID。get_映射函数应该返回字典列表。每个字典都应该有一个非Ensembl ID作为键,以及相应的Ensembl ID作为值。列表中字典的数量应该等于列数减去1
import sys
def get_mapping(map_file):
f = open(map_file, "r")
# Result is a list of dictionaries.
mapping_list = []
# Skip the header on the first line.
header = f.readline()
header = header.split()
#dicts in mapping_list
col =len(header)-1
for i in range(col):
d={}
mapping_list.append(d)
for line in f:
line=line.strip('\n').split('\t');
for dic in range(len(mapping_list)):
mapping_list[dic][line[dic+1]]=line[0]
print(mapping_list)
f.close()
return mapping_list
get_mapping('rno.map')
get_mapping('mmu.map')
get_mapping('hsa.map')
输出中不应包含的项:
['ENSRNOP00000058792', '', '', '']
['ENSMUSP00000100465', 'MGI:3645509']
['ENSP00000375105', 'P63162', 'Q6LBS1']
如何删除此项?
这是我应该在输出中得到的:
rno mapping
1302936 ENSRNOP00000015679
1302939 ENSRNOP00000027305
1302944 ENSRNOP00000025813
1302945 ENSRNOP00000010637
1302952 ENSRNOP00000003046
1302957 ENSRNOP00000020169
1302959 ENSRNOP00000006804
...
hsa.map
a0a2g6 ensp00000327895
a0a5b6 ensp00000374923
a0a962 ensp00000315112
a0aul9 ensp00000227459
a0av47 ensp00000260810
a0av56 ensp00000292123
a0avg4 ensp00000305200
a0avk6 ensp00000250024
a0avt1 ensp00000313454
a0ejg6 ensp00000321606
mmu.map
mgi:101761 ensmusp00000072556
mgi:101762 ensmusp00000008542
mgi:101763 ensmusp00000077262
mgi:101764 ensmusp00000099514
mgi:101765 ensmusp00000030814
mgi:101766 ensmusp00000035142
mgi:101769 ensmusp00000044048
mgi:101770 ensmusp00000028907
mgi:101771 ensmusp00000037324
mgi:101772 ensmusp00000028991
mgi:101773 ensmusp00000023618
mgi:101774 ensmusp00000003469
mgi:101775 ensmusp00000097404
mgi:101776 ensmusp00000027740
I get in the output:
rno mapping
ENSRNOP00000058792
1302936 ENSRNOP00000015679
1302939 ENSRNOP00000027305
1302944 ENSRNOP00000025813
1302945 ENSRNOP00000010637
1302952 ENSRNOP00000003046
1302957 ENSRNOP00000020169
1302959 ENSRNOP00000006804
1302965 ENSRNOP00000012050
1302972 ENSRNOP00000042145
1302973 ENSRNOP00000033541
...
文件示例rno.map
Ensembl_Protein_ID UniProt/SwissProt_Accession UniProt/TrEMBL_Accession RGD_ID
ENSRNOP00000000008 P18088 C9E895 2652
ENSRNOP00000000008 P18088 B3VQJ0 2652
ENSRNOP00000000009 D3ZEM1 1310201
ENSRNOP00000000025 B4F7C7
ENSRNOP00000000029 Q9ES39 620038
ENSRNOP00000000037 Q7TQM3 735156
ENSRNOP00000000052 O70352 Q6IN14 69070
ENSRNOP00000000053 Q9JLM2 68400
ENSRNOP00000000064 P97874 621589
ENSRNOP00000000072 P29419 621377
ENSRNOP00000000074 B2RZ28 1304584
文件示例mmu.map
Ensembl_Protein_ID MGI_ID
ENSMUSP00000000001 MGI:95773
ENSMUSP00000000028 MGI:1338073
ENSMUSP00000000033 MGI:96434
ENSMUSP00000000049 MGI:88058
ENSMUSP00000000058 MGI:107571
ENSMUSP00000000090 MGI:88474
ENSMUSP00000000094 MGI:2138865
ENSMUSP00000000095 MGI:98494
ENSMUSP00000000122 MGI:97323
ENSMUSP00000000127 MGI:98955
ENSMUSP00000000129 MGI:105917
ENSMUSP00000000137 MGI:1913963
ENSMUSP00000000153 MGI:95767
ENSMUSP00000000161 MGI:1277979
ENSMUSP00000000163 MGI:1919308
ENSMUSP00000000175 MGI:1914175
ENSMUSP00000000186 MGI:1891427
ENSMUSP00000000187 MGI:95520
文件示例hsa.map
Ensembl_Protein_ID UniProt/SwissProt_Accession UniProt/TrEMBL_Accession
ENSP00000000233 P84085 A4D0Z3
ENSP00000000412 P20645 Q96AH2
ENSP00000000442 P11474 Q96I02
ENSP00000000442 P11474 Q96F89
ENSP00000000442 P11474 Q569H8
ENSP00000001008 Q02790
ENSP00000002829 Q13275
ENSP00000003084 P13569 Q9UML7
ENSP00000003084 P13569 Q9UJ19
ENSP00000003084 P13569 Q99989
ENSP00000003084 P13569 Q6KEJ7
ENSP00000003084 P13569 Q6KEJ4
目前没有回答
相关问题 更多 >
编程相关推荐