序列中的模式计数

1 投票
1 回答
526 浏览
提问于 2025-04-17 06:22

我有一个文本文件,里面有一些序列,格式如下:

>

P1

MPPRRSIVEVKVLDVQKRRVPNKHYVYIIRVTWSSGATEAIYRRYSKFFDLQMQMLDKFP MEGGQKDPKQRIIPFLPGKILFRRSHIRDVAVKRLIPIDEYCKALIQLPPYISQCDEVLQ FFETRPEDLNPPKEEHIGKKKSGNDPTSVDPMVLEQYVVVADYQKQESSEISLSVGQVVD IIEKNESGWWFVSTAEEQGWVPATCLEGQDGVQDEFSLQPEEEEKYTVIYPYTARDQDEM NLERGAVVEVVQKNLEGWWKIRYQGKEGWAPASYLKKNSGEPLPPKLGPSSPAHSGALDL DGVSRHQNAMGREKELLNNQRDGRFEGRLVPDGDVKQRSPKMRQRPPPRRDMTIPRGLNL

>

P2

MAEVRKFTKRLSKPGTAAELRQSVSEAVRGSVVLEKAKLVEPLDYENVITQRKTQIYSDP LRDLLMFPMEDISISVIGRQRRTVQSTVPEDAEKRAQSLFVKECIKTYSTDWHVVNYKYE DFSGDFRMLPCKSLRPEKIPNHVFEIDEDCEKDEDSSSLCSQKGGVIKQGWLHKANVNST ITVTMKVFKRRYFYLTQLPDGSYILNSYKDEKNSKESKGCIYLDACIDVVQCPKMRRHAF ELKMLDKYSHYLAAETEQEMEEWLIMLKKIIQINTDSLVQEKKDTVEAIQEEETSSQGKA ENIMASLERSMHPELMKYGRETEQLNKLSRGDGRQNLFSFDSEVQRLDFSGIEPDVKPFE EKCNKRFMVNCHDLTFNILGHIGDNAKGPPTNVEPFFINLALFDVKNNCKISADFHVDLN PPSVREMLWGTSTQLSNDGNAKGFSPESLIHGIAESQLCYIKQGIFSVTNPHPEIFLVVR

>

P3

GDDSEWLKLPVDQKCEHKLWKARLSGYEEALKIFQKIKDEKSPEWSKYLGLIKKFVTDS NAVVQLKGLEAALVYVENAHVAGKTTGEVVSGVVSKVFNQPKAKAKELGIEICLMYVEIE KGESVQEELLKGLDNKNPKIIVACIETLRKALSEFGSKIISLKPIIKVLPKLFESRDKAV RDEAKLFAIEIYRWNRDAVKHTLQNINSVQLKELEEEWVKLPTGAPKPSRFLRSQQELEA KLEQQQSAGGDAEGGGDDGDEVPQVDAYELLDAVEILSKLPKDFYDKIEAKKWQERKEAL EAVEVLVKNPKLEAGDYADLVKALKKVVGKDTNVMLVALAAKCLTGLAVGLRKKFGQYAG HVVPTILEKFKEKKPQVVQALQEAIDAIFLTTTLQNISEDVLAVMDNKNPTIKQQTSLFI ARSFRHCTSSTLPKSLLKPFCAALLKHINDSAPEVRDAAFEALGTALKVVGEKSVNPFLA

......

总共有100个序列。在这些序列中,我用Python脚本搜索了一个我感兴趣的模式,代码如下:

import re

infile=open("seq.fasta",'r')

out=open("results.csv",'w')
pattern=re.compile(r"(P[A-Z]{2}P)")
for line in infile:
    line = line.strip("\n")
    if line.startswith('>'):
        name=line
    else:
        s = re.findall(pattern,line)
    print '%s:%s' %(name,s)
    out.write('%s:\t%s\n' %(name,s))

这个脚本运行得很好,找到了我想要的模式……现在我想统计每个序列中这个模式出现的次数,脚本的输出结果如下:

>

p1 : PGCP

>

p1 : PHCP, PKCP ......

但是我想要的输出格式是这样的:

p1 : 1

>

p1 : 2 ......

有没有人能告诉我怎么用Python实现这个?

1 个回答

1

findall 方法会返回一个匹配字符串的列表。所以你可以在代码中直接使用 len(s),而不是 s

out.write('%s:\t%s\n' %(name,len(s)))

撰写回答