基于一列匹配文件问题的回答

基于一列匹配文件

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

我认为这是一个非常有效的方法，可以满足您的需求，所以希望能够很好地扩展。您没有说明您使用的是哪一版本的Python，所以它是用version 2.x编写的。用于创建输出文件的字段分隔符是一个变量，因此可以很容易地进行更改。你知道吗 匹配的数量不限于5000个-它会找到所有的-但如果真的有必要的话，可以增加一个限制。你知道吗 <pre><code>from collections import defaultdict TOLERANCE = 0.05 DELIM = '\t' ref_dict = {} with open('second_file.txt', 'rt') as inf: next(inf) # skip header row for line in inf: fields = line.split() ref_dict[fields[0]] = float(fields[3]) # rsID to MAF matches = defaultdict(list) with open('first_file.txt', 'rt') as inf: next(inf) # skip header row for line in inf: fields = line.split() rsID, MAF = fields[0], float(fields[1]) for ref_id, ref_value in ref_dict.iteritems(): if abs(MAF-ref_value) <= TOLERANCE: matches[rsID].append(ref_id) # determine maximum number of matches for output file header row longest = max(map(len, (v for v in matches.itervalues()))) with open("output.txt", "wt") as outf: outf.write('rsId' + DELIM + DELIM.join('match%d' % i for i in xrange(1, longest+1)) + '\n') fmt_str = '{}' + DELIM + '{}\n' for k,v in matches.iteritems(): outf.write(fmt_str.format(k, (DELIM.join(v)))) </code></pre> 根据问题中显示的示例数据生成的<code>output.txt</code>的内容（<code>»</code>表示制表符）： <pre class="lang-none prettyprint-override"><code>rsId» match1» match2» match3» match4 rs870123» rs908341» rs090321» rs701234» rs101098 rs9038241» rs100981 rs1234123» rs512341 rs1293048» rs090321» rs701234» rs101098 rs723904» rs100981 rs1980123» rs512341 rs3801423» rs090321» rs701234» rs101098» rs100981 rs8041239» rs512341 rs239401» rs512341 rs314234» rs090321» rs701234» rs101098 </code></pre>

基于一列匹配文件

1 个回答

相关Python问题