2024-06-12 17:09:10 发布
网友
如何用python编写一段代码,读取DNA序列链并返回它的重复碱基列表,描述这三件事:哪个是碱基(AGTC),它在链中的位置以及重复的次数。例如:
ACTTTTGTCTAAACCCCGTCCTATAACT
这个函数的输出是:list_base=[('T',3,4),('A',11,3),('C',14,6)]
我做了以下工作:
import re from collections import defaultdict seq = "ACTTTTGTCTAAACCCCCCGTCCTATATATAACT" bases = ['A','G','C','T'] indexes = defaultdict(list) counts = dict() for base in bases: comSeq = re.compile(base) matches = comSeq.findall(seq) count = len(matches) counts[base] = count start = 0 for match in matches: index = seq.find(base, start) indexes[base].append(index) start = index +1 print(indexes) print(counts)
dict索引为您提供链中基的每个位置:
{'A': [0, 10, 11, 12, 24, 26, 28, 30, 31], 'G': [6, 19], 'C': [1, 8, 13, 14, 15, 16, 17, 18, 21, 22, 32], 'T': [2, 3, 4, 5, 7, 9, 20, 23, 25, 27, 29, 33]}
dict counts为您提供了基在链中出现的次数:
{'A': 9, 'G': 2, 'C': 11, 'T': 12}
这可能不是最好、最有效的代码,我也不确定你想要什么,希望这能有所帮助
这就是你要找的吗
DNA_seq = 'ACTTTTGTCTAAACCCCCCGTCCTATATATAACT' count_dic = {'A': [0,0], "G": [0,0], "C": [0,0], "T": [0,0]} for i in range(len(DNA_seq)-1): j=i seq_count = 1 while DNA_seq[j] == DNA_seq[j+1]: seq_count +=1 j +=1 if seq_count > count_dic[DNA_seq[i]][1]: count_dic[DNA_seq[i]][1] = seq_count count_dic[DNA_seq[i]][0] = i + 1
count_dic的内容是
{'A': [11, 3], 'G': [0, 0], 'C': [14, 6], 'T': [3, 4]}
我做了以下工作:
dict索引为您提供链中基的每个位置:
dict counts为您提供了基在链中出现的次数:
这可能不是最好、最有效的代码,我也不确定你想要什么,希望这能有所帮助
这就是你要找的吗
count_dic的内容是
相关问题 更多 >
编程相关推荐