如何使用python提取csv文件中轨迹ID与列表匹配的文件中的FASTA序列

2024-03-28 23:06:48 发布

您现在位置:Python中文网/ 问答频道 /正文

csv文件:

1.1,LOC_Os02g03440,Os02g0126700,osSmE-a,2,r,1399279,1401336,Sm 1,, 1.1,LOC_Os01g15310,Os01g0256900,osLSM4,1,r,8569841,8573555,Sm 1,, 1.1,LOC_Os07g07220,Os07g0166600,osSmB-a,7,f,3573405,3575954,Sm 1,,

fasta文件:

LOC_Os05g07030.3 pacid=33157318 polypeptide=LOC_Os05g07030.3 locus=LOC_Os05g07030 ID=LOC_Os05g07030.3.MSUv7.0 annot-version=v7.0 ATGCGAGCTCTCGCGGCGGCGGCGGCAACGGCAACAGCGACTGCAGCGGCGGCGGCGGCGCCTTCCCCCGCGCGCTTCCCTCTCCGCCTCGTCGTCACCCCGCGCGCCTCGTTAGGTCATTGTAGAGCATCTTCCTCCGCAAGGTCTCCGAGGAGG LOC_Os05g04170.1 pacid=33157320 polypeptide=LOC_Os05g04170.1 locus=LOC_Os05g04170 ID=LOC_Os05g04170.1.MSUv7.0 annot-version=v7.0

输出文件应如下所示:

LOC_Os05g07030 ID=LOC_Os05g07030.1 ATGCGAGCTCTCGCGGCGGCGGCGGCAACGGCAACAGCGACTGCAGCGGCGGCGGCGGCGCCTTCCCCCGCGCGCTTCCCTCTCCGCCTCGTCGTCACCCCGCGCGCCTCGTTAGGTCATTGTAGAGCATCTTCCTCCGCAAGGTCTCCGAGGAGG

例如,如果csv文件的LOC_Os02g03440 id与fasta文件的轨迹匹配,我想将该轨迹及其异构体的序列提取到另一个文件中。我对python非常陌生,我写了一个脚本,但我没有得到答案。请帮我写剧本吧。你知道吗

#!/usr/bin/python
import os
import re
path=os.getcwd()
list_dir=os.listdir(path+'//Osativa')
if not os.path.exists('results'):
    os.makedirs('results')
fo1=open('./results/cdna.txt','w')
f1=open(path+'//2016-10-19 Rice SF List.csv').readlines()
f2=open(path+'//Osativa//'+'//Osativa_323_v7.0.cds.fa').readlines()
locus_id={}
for line in f1:
    locus_id=line.split(',')[1]
for line in f2:
    if line.startswith('>'):
        locus=line.split()[4]
        isoform=line.split()[0]
        CDS_length=0
        if locus_id==locus:
            fo1.write(locus_id+'\t'+locus+'\t'+str(CDS_length)+'\n')
        else:
            pass
    else:
        pass
fo1.close()

Tags: 文件csvpathidifoslineopen
1条回答
网友
1楼 · 发布于 2024-03-28 23:06:48

您可以这样做,例如:

import csv

locus_set = set()
fieldnames = ['Version', 'locus', 'ID', 'ID2', 'n', 'i', 'v1', 'v2', 'v3', 'x']
with open('test/FASTA_list.csv') as f:
    for row in csv.DictReader(f, fieldnames=fieldnames, delimiter=","):
        locus_set.add( row['locus'] )


with open('test/FASTA_results.txt', 'w') as fresult, \
     open('test/FASTA.fa') as f:

    while True:
        FASTA_1 = f.readline()
        FASTA_2 = f.readline()
        if not FASTA_1 or not FASTA_2: break

        FASTA = FASTA_1.split()
        locus = FASTA[3].split('=')[1]
        ID = FASTA[0]

        if locus in locus_set:
            fresult.write('%s\tID=%s\t%s\n' % (locus, ID, FASTA_2) )

相关问题 更多 >