利用已知序列从fasta文件中提取序列和头文件

def get_nucl(filename): with open(filename,'r') as fd: nucl = [] for line in fd: if line[0]!='>': nucl.append(line.strip()) return nucl def finding(filename,reffile): nucl = get_nucl(filename) with open(reffile,'r') as reffile2: for line in reffile2: for element in nucl: if line.strip() in element: yield(element) with open('sequencesmatched.txt','w') as output: results = finding('text.fa','textref.fa',) for res in results: print(res) output.write(res + '\n')

def finding(filename,seqfile): with open(filename,'r') as fastafile: with open(seqfile,'r') as sequf: alls=[] for line in fastafile: alls.append(line.strip()) print(alls) sequfs = [] for line2 in sequf: sequfs.append(line2.strip()) if str(line.strip()) == str(line2.strip()): num = alls.index(line.strip()) print(alls[num-1] + line) print(finding('text.fa','sequencesmatched.txt'))

1条回答

网友

1楼 · 发布于 2024-05-16 20:26:04

如果文件的结构始终相同，则可以更轻松地执行以下操作：

def get_nucl(filename):
    with open(filename, 'r') as fd:
        headers = {}
        key = ''
        for line in fd.readlines():    
            if '>' in line:
                key = line.strip()[1:] # to remove the '>'
            else:
                headers[key] = line.strip()

    return headers

这里我假设你的文件以“>；headern”开头，如果不是的话，你必须添加一些测试。现在你有一个像headers['header1'] = 'ETTTHAASCISATTVQEQ*TLFRLLP'这样的单词。在

所以现在要找到匹配项，你只需使用这句话：

^{pr2}$

因此，当您有一个头与它们的值匹配的dict时，如果您有一个子字符串，并且已经将头值作为键，那么就可以签入dict。在

刚才看到你做了print(finding(....)，你的函数已经打印出来了，所以就调用它。在

相关问题更多 >

编程相关推荐

热门问题

热门文章