从NCBI BLASTp解析表

2条回答

网友

1楼 · 编辑于 2024-05-23 21:09:08

快速python脚本：

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import fileinput
from collections import defaultdict

output = defaultdict(list)
proteins = set()

for line in fileinput.input():
    bacteria, protein = line.strip().split()
    proteins.update([protein])
    output[bacteria].append(protein)

# Print header
print ' '*12,
for header in sorted(proteins):
    print '{:25}'.format(header),
print

# Print table
for key in output:
    print '{:12}'.format(key),
    for header in sorted(proteins):
        if header in output[key]:
            print '{:22}'.format(1),
        else:
            print '{:22}'.format(0),
    print

输出：

^{pr2}$

网友

2楼 · 编辑于 2024-05-23 21:09:08

以下是GNU awk的一种方法：

awk '{
    header[$2]++; 
    bacteria[$1]++; 
    map[$1,$2]++
}
END { 
    x=asorti(header,header_s); 
    for(i=1;i<=x;i++) { 
        printf "\t%s\t", header_s[i]   
    }
    print ""
    y=asorti(bacteria,bacteria_s); 
    for(j=1;j<=y;j++) { 
        printf "%s\t\t", bacteria_s[j]; 
        for (z=1;z<=x;z++) {
            printf "%s\t\t\t\t", (map[bacteria_s[j],header_s[z]])?"1":"0"
        } 
    print ""
    } 
}' file
        protein:plasmid:147856          protein:plasmid:149679          protein:proph:183386
bacteria_1              0                               1                               1
bacteria_2              0                               0                               1
bacteria_3              1                               0                               1

下面是一个正则awk的解决方案：

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章

从NCBI BLASTp解析表

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >