比较文件的连续列并返回不匹配元素的数目

# sampleID HGDP00511 HGDP00511 HGDP00512 HGDP00512 HGDP00513 HGDP00513 M rs4124251 0 0 A G 0 A M rs6650104 0 A C T 0 0 M rs12184279 0 0 G A T 0

2条回答

网友

1楼 · 编辑于 2024-05-23 14:55:24

我强烈建议您使用pandas而不是编写自己的代码：

import numpy as np
import pandas as pd
df = pd.read_csv("phased.txt")
match_counts = {(i,j): 
                   np.sum(df[df.columns[i]] != df[df.columns[j]]) 
                           for i in range(3,len(df.columns))
                           for j in range(3,len(df.columns))}

match_counts
{(6, 4): 3,
 (4, 7): 2,
 (4, 4): 0,
 (4, 3): 3,
 (6, 6): 0,
 (4, 5): 3,
 (5, 4): 3,
 (3, 5): 3,
 (7, 7): 0,
 (7, 5): 3,
 (3, 7): 2,
 (6, 5): 3,
 (5, 5): 0,
 (7, 4): 2,
 (5, 3): 3,
 (6, 7): 2,
 (4, 6): 3,
 (7, 6): 2,
 (5, 7): 3,
 (6, 3): 2,
 (5, 6): 3,
 (3, 6): 2,
 (3, 3): 0,
 (7, 3): 2,
 (3, 4): 3}

网友

2楼 · 编辑于 2024-05-23 14:55:24

解决这个问题的纯原生python库方法-让我们知道它与bash828x828的比较应该是在公园里散步。在

元素列计数：

为了简单和说明性的目的，我特意在序列翻转中添加了一个步骤——您可以通过更改类对象的逻辑或用法、函数修饰符等来改进它。。。在

Python 2.7代码：

shiftcol = 2  # shift columns as first two are to be ignored
with open('phased.txt') as f:
    data = [x.strip().split('\t')[shiftcol:] for x in f.readlines()][1:]

# Step 1: Flipping the data first
flip = []
for idx, rows in enumerate(data):
    for i in range(len(rows)):
        if len(flip) <= i:
            flip.append([])
        flip[i].append(rows[i])

# Step 2: counts store in temp dictionary
for idx, v in enumerate(flip):
    for e in v:
        tmp = {}
        for i, z in enumerate(flip):
            if i != idx and e != '0':
                # Dictionary to store results
                if i+1 not in tmp:  # note has_key will be deprecated
                    tmp[i+1] = {'match': 0, 'notma': 0}
                tmp[i+1]['match'] += z.count(e)
                tmp[i+1]['notma'] += len([x for x in z if x != e])

        # results compensate for column shift..
        for key, count in tmp.iteritems():
            print idx+shiftcol+1, key+shiftcol, ': ', count

样本输出

^{pr2}$

元素列计数：

Python 2.7代码：

样本输出

相关问题更多 >

编程相关推荐

热门问题

热门文章

比较文件的连续列并返回不匹配元素的数目

元素列计数：

Python 2.7代码：

样本输出

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >