循环遍历字典，同时创建元组

def parse_matrix(matrix_line): matrixFields = matrix_line.rstrip("\n").split("\t") protein = matrixFields[0] if matrixFields[0] in transcript_to_protein: protein = transcript_to_protein.get(transcript) matrixFields[0] = protein return(tuple(matrixFields))

Q09748.1 4.00 0.07 16.84 26.37 O60164.1 24.55 116.87 220.53 28.82 C5161_G1_I1 107.49 89.39 26.95 698.97 P36614.1 27.91 72.57 5.56 36.58 P37818.1 82.57 19.03 48.55 258.22

O94423.1 4.00 0.07 16.84 26.37 O94423.1 24.55 116.87 220.53 28.82 C5161_G1_I1 107.49 89.39 26.95 698.97 O94423.1 27.91 72.57 5.56 36.58 O94423.1 82.57 19.03 48.55 258.22

transcript_to_protein = {}; def parse_blast(blast_line="NA"): fields = blast_line.rstrip("\n").split("\t") queryIdString = fields[0] subjectIdString = fields[1] identity = fields[2] queryIds = queryIdString.split("|") subjectIds = subjectIdString.split("|") transcript = queryIds[0].upper() swissProt = subjectIds[3] base = swissProt.split(".")[0] return(transcript, swissProt, identity) blast_output = open("/scratch/RNASeq/blastp.outfmt6") blast_lines = blast_output.readlines() for line in blast_lines: (transcript,swissProt,identity) = parse_blast(blast_line=line) transcript_to_protein[transcript] = swissProt def parse_matrix(matrix_line): matrixFields = matrix_line.rstrip("\n").split("\t") matrixFields[0] = matrixFields[0].upper() protein = matrixFields[0] if matrixFields[0] in transcript_to_protein: protein = transcript_to_protein.get(transcript) matrixFields[0] = protein return(tuple(matrixFields)) def tuple_to_tab_sep(one_tuple): tab = "\t" return tab.join(one_tuple) matrix = open("/scratch/RNASeq/diffExpr.P1e-3_C2.matrix") newline = "\n" list_of_de_tuples = map(parse_matrix,matrix.readlines()) list_of_tab_sep_lines = map(tuple_to_tab_sep, list_of_de_tuples) print(newline.join(list_of_tab_sep_lines))

3条回答

网友

1楼 · 编辑于 2024-04-24 04:00:45

首先，在parse_blast()中有一个bug—它不返回元组(transcript,swissProt,identity)，而是返回(transcript,base,identity)，base不包含丢失的信息。你知道吗

更新

其次，parse_matrix()中还有一个bug。从文件读取的第一个字段没有丢失的信息，但是，当matrixFields[0]在transcript_to_protein字典中时，它会将这些信息放入返回的元组中。你知道吗

仅仅解决一个问题本身并不能解决问题。你知道吗

网友
2楼 · 编辑于 2024-04-24 04:00:45

错误出现在我的dictionary调用中，因为我想将matrixFields[0]与dictionary中的转录本匹配，所以我尝试使用if matrixFields[0] in transcript_to_protein:搜索dictionary，但需要指定字段
trasncript = matrixfields[0] if transcript in transcript_to_protein: protein = transcript_to_protein.get(transcript)

网友
3楼 · 编辑于 2024-04-24 04:00:45

似乎问题可能出在parseblast函数中。对于行

c1000_g1_i1|m.799   gi|48474761|sp|O94288.1|NOC3_SCHPO  100.00  747 0   0   5   751 1   747 0.0  1506

subjectIdString = fields[1]

所以主语应该是gi | 48474761 | sp | O94288.1 | NOC3 | u SCHPO

然后呢

swissProt = subjectIds[3]

swissProt将是O94288.1，函数使用它进一步拆分。在线

base = swissProt.split(".")[0]

最终结果是swissprot将是094288，而不是| O94288.1，这似乎是您所期望的。我建议在单行输入上测试该函数，直到得到所需的输出

相关问题更多 >

编程相关推荐

热门问题

热门文章