如果蛋白质ID列在txt文件(Interested proteins.txt
)中,我想从新文件(swissprot_canonical-isoforms.fasta
)中的.fasta文件(swissprot_canonical-isoforms.fasta
)中提取蛋白质序列的子集
下面显示了swissprot_canonical-isoforms.fasta
中的部分蛋白质序列。在以“>;”开头的行中,蛋白质ID显示在两个“|”之间。例如,“P04637”是一个蛋白质ID
>sp|P04637|P53_HUMAN Cellular tumor antigen p53 OS=Homo sapiens GN=TP53 PE=1 SV=4 MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGP DEAPRMPEAAPPVAPAPAAPTPAAPAPAPSWPLSSSVPSQKTYQGSYGFRLGFLHSGTAK SVTCTYSPALNKMFCQLAKTCPVQLWVDSTPPPGTRVRAMAIYKQSQHMTEVVRRCPHHE RCSDSDGLAPPQHLIRVEGNLRVEYLDDRNTFRHSVVVPYEPPEVGSDCTTIHYNYMCNS SCMGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCACPGRDRRTEEENLRKKGEPHHELP PGSTKRALPNNTSSSPQPKKKPLDGEYFTLQIRGRERFEMFRELNEALELKDAQAGKEPG GSRAHSSHLKSKKGQSTSRHKKLMFKTEGPDSD >sp|P04637-2|P53_HUMAN Isoform 2 of Cellular tumor antigen p53 OS=Homo sapiens GN=TP53 MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGP DEAPRMPEAAPPVAPAPAAPTPAAPAPAPSWPLSSSVPSQKTYQGSYGFRLGFLHSGTAK SVTCTYSPALNKMFCQLAKTCPVQLWVDSTPPPGTRVRAMAIYKQSQHMTEVVRRCPHHE RCSDSDGLAPPQHLIRVEGNLRVEYLDDRNTFRHSVVVPYEPPEVGSDCTTIHYNYMCNS SCMGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCACPGRDRRTEEENLRKKGEPHHELP PGSTKRALPNNTSSSPQPKKKPLDGEYFTLQDQTSFQKENC >sp|P04637-3|P53_HUMAN Isoform 3 of Cellular tumor antigen p53 OS=Homo sapiens GN=TP53 MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGP DEAPRMPEAAPPVAPAPAAPTPAAPAPAPSWPLSSSVPSQKTYQGSYGFRLGFLHSGTAK SVTCTYSPALNKMFCQLAKTCPVQLWVDSTPPPGTRVRAMAIYKQSQHMTEVVRRCPHHE RCSDSDGLAPPQHLIRVEGNLRVEYLDDRNTFRHSVVVPYEPPEVGSDCTTIHYNYMCNS SCMGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCACPGRDRRTEEENLRKKGEPHHELP PGSTKRALPNNTSSSPQPKKKPLDGEYFTLQMLLDLRWCYFLINSS
以下是Interested proteins.txt
中列出的一些蛋白质ID
Q6ZWH5
Q8NG66
P51955
P51957
P04629
最终输出应如下所示(仅列出Q6ZWH5的序列作为示例):
>sp|Q6ZWH5|NEK10_HUMAN Serine/threonine-protein kinase Nek10 OS=Homo sapiens GN=NEK10 PE=2 SV=3
MPDQDKKVKTTEKSTDKQQEITIRDYSDLKRLRCLLNVQSSKQQLPAINFDSAQNSMTKS
EPAIRAGGHRARGQWHESTEAVELENFSINYKNERNFSKHPQRKLFQEIFTALVKNRLIS
REWVNRAPSIHFLRVLICLRLLMRDPCYQEILHSLGGIENLAQYMEIVANEYLGYGEEQH
TVDKLVNMTYIFQKLAAVKDQREWVTTSGAHKTLVNLLGARDTNVLLGSLLALASLAESQ
ECREKISELNIVENLLMILHEYDLLSKRLTAELLRLLCAEPQVKEQVKLYEGIPVLLSLL
HSDHLKLLWSIVWILVQVCEDPETSVEIRIWGGIKQLLHILQGDRNFVSDHSSIGSLSSA
NAAGRIQQLHLSEDLSPREIQENTFSLQAACCAALTELVLNDTNAHQVVQENGVYTIAKL
ILPNKQKNAAKSNLLQCYAFRALRFLFSMERNRPLFKRLFPTDLFEIFIDIGHYVRDISA
YEELVSKLNLLVEDELKQIAENIESINQNKAPLKYIGNYAILDHLGSGAFGCVYKVRKHS
GQNLLAMKEVNLHNPAFGKDKKDRDSSVRNIVSELTIIKEQLYHPNIVRYYKTFLENDRL
YIVMELIEGAPLGEHFSSLKEKHHHFTEERLWKIFIQLCLALRYLHKEKRIVHRDLTPNN
IMLGDKDKVTVTDFGLAKQKQENSKLTSVVGTILYSCPEVLKSEPYGEKADVWAVGCILY
QMATLSPPFYSTNMLSLATKIVEAVYEPVPEGIYSEKVTDTISRCLTPDAEARPDIVEVS
SMISDVMMKYLDNLSTSQLSLEKKLERERRRTQRYFMEANRNTVTCHHELAVLSHETFEK
ASLSSSSSGAASLKSELSESADLPPEGFQASYGKDEDRACDEILSDDNFNLENAEKDTYS
EVDDELDISDNSSSSSSSPLKESTFNILKRSFSASGGERQSQTRDFTGGTGSRPRPALLP
LDLLLKVPPHMLRAHIKEIEAELVTGWQSHSLPAVILRNLKDHGPQMGTFLWQASAGIAV
SQRKVRQISDPIQQILIQLHKIIYITQLPPALHHNLKRRVIERFKKSLFSQQSNPCNLKS
EIKKLSQGSPEPIEPNFFTADYHLLHRSSGGNSLSPNDPTGLPTSIELEEGITYEQMQTV
IEEVLEESGYYNFTSNRYHSYPWGTKNHPTKR
有没有办法用python实现这一点?任何帮助都将不胜感激
您可以使用
pyfasta
实现这一点,这是python中FASTA格式的接口输出:
相关问题 更多 >
编程相关推荐