Python和MYSQL性能：将大量SQL查询结果写入fi

Example Input line and output line INPUT XM_006557349.1 1 - exon XM_006557349.1_exon_2 10316 10534 {1: 10509:10534', 2: '10488:10508', 3: '10467:10487', 4: '10446:10466', 5: '10425:10445', 6: '10404:10424', 7: '10383:10403', 8: '10362:10382', 9: '10341:10361', 10: '10316:10340'} OUTPUT XM_006557349.1 1 - exon XM_006557349.1_exon_2 10316 105340.7083 0.2945 0.2 0.2931 0.125 0.1154 0.2095 0.5833 0.0569 0.0508 CODE def array_2_meth(sample,bin_type,type,cur_meth): bins_in = open('bin_dicts/'+bin_type,'r') meth_out = open('meth_data/'+bin_type+'_'+sample+'_plus_'+type+'_meth.tsv','w') for line in bins_in.readlines(): meth_dict = {} # build array of data from each line array = line.strip('\n').split('\t') mrna_id = array[0] assembly = array[1] strand = array[2] bin_dict = ast.literal_eval(array[7]) for bin in bin_dict: coords = bin_dict[bin].split(':') start = int(coords[0]) -1 end = int(coords[1]) +1 cur_meth.execute('select sum(mc)/sum(h) from allc_'+str(sample)+'_'+str(assembly) + ' where strand = \'' +str(strand) +'\' and class = \''+str(type)+'\' and position between '+str(start)+' and ' +str(end) + ' and h >= 5') for row in cur_meth.fetchall(): if str(row[0]) == 'None': meth_dict[bin] = 'no_cov' else: meth_dict[bin] = float(row[0]) meth_out.write('\t'.join(array[:7])) for k in sorted(meth_dict.keys()): meth_out.write('\t'+str(meth_dict[k])) meth_out.write('\n') meth_out.close()

1条回答

网友

1楼 · 发布于 2024-04-20 13:37:11

我认为fileIO不应该花费太长时间，主要的瓶颈可能是您正在进行的查询的数量。但从您提供的示例中，我看不到这些起始和结束位置的模式，因此我不知道如何减少您正在进行的查询量。你知道吗

我可能有一个惊人或愚蠢的想法，这取决于你的测试结果

似乎每个查询只返回一个值？也许你可以试试

SQL = ''
for bin in bin_dict:
    coords = bin_dict[bin].split(':')
    start = int(coords[0]) -1
    end = int(coords[1]) +1
    SQL += 'select sum(mc)/sum(h) from allc_'+str(sample)+'_'+str(assembly) + ' where strand = \'' +str(strand) +'\' and    class = \''+str(type)+'\' and position between '+str(start)+' and ' +str(end) + ' and h >= 5'
    SQL += 'UNION ALL'
    //somehow remove the last UNION ALL at end of loop

cur_meth.execute(str(SQL))
for row in cur_meth.fetchall():
    //loop through the 10 row array and write to file

核心思想是使用UNION ALL将所有查询连接到1中，因此您只需要执行1个事务，而不是示例中所示的10个事务。您还可以将10write to file操作减少为1。可能的缺点是UNION ALL可能很慢，但据我所知，只要在我的示例中保持SQL格式，就不应该再花费10个单独查询的处理时间。你知道吗

第二个明显的方法是多线程。如果您没有使用机器的所有处理能力，您可能会尝试同时启动多个脚本/程序，因为您所做的只是查询数据，而不修改任何内容。这将导致单个脚本稍微慢一点，但总体上更快，因为这样可以减少查询之间的等待时间。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章