Python Dict和Forloop与FASTA fi

网友

1楼 · 编辑于 2024-05-23 21:31:09

只有不包含以>开头的内容的行忽略这些：

with open("input.fasta") as ecoli: # will close your file automatically
    from collections import defaultdict
    counts = defaultdict(int) 
    for line in ecoli: # iterate over file object, no need to read all contents into memory
        if line.startswith(">"): # skip lines that start with >
            continue
        for char in line: # just iterate over the characters in the line
            if char in {"A", "C", "D", "E", "F", "G", "H", "I", "K", "L", "M", "N", "P", "Q", "R", "S", "T", "V", "W", "Y"}:
                    counts[char] += 1
    total = float(sum(counts.values()))       
    for key,val in counts.items():
        print("{}: {}, ({:.1%})".format(key,val, val / total))

你也可以使用集合。计数器dict as the lines only contain what you interest in the lines:

^{pr2}$

网友

2楼 · 编辑于 2024-05-23 21:31:09

使用Counter可以使它更容易一些，并且避免使用字典（我喜欢dicts，但是在本例中，Counter确实有意义）。在

from collections import Counter
acids = ""                      # dunno if this is the right terminology
with open(filename, 'r') as ecoli_file:
    for line in ecoli_file:
        if line.startswith('>'):
            continue
        # from what I saw in the FASTA files, the character-check is
        # not necessary anymore...
        acids += line.strip()   # stripping newline and possible whitespaces
 counter = Counter(acids)       # and all the magic is done.
 total = float(sum(counter.values()))
 for k, v in counter.items():
     print "{}: {} ({:.1%})".format(k, v, v / total)

由于Counter接受ITerable，因此应该可以使用生成器来完成：

^{pr2}$

网友

3楼 · 编辑于 2024-05-23 21:31:09

你是正确的，你正在接近这一点，你将计数字符的实例，无论他们在哪里，甚至在描述行。在

但是你的代码甚至不能运行，你试过了吗？你有线.分割（）但行未定义（以及许多其他错误）。另外，你已经在按字串“你正在按字串”。在

一种简单的方法是读入文件，在换行符上拆分，跳过以“>；”开头的行，汇总您关心的每个字符的数量，并保持所有分析过的字符的运行总数。在

#!/usr/bin/python
ecoli = open("/home/file_pathway.faa").read()
counts = dict()
nucleicAcids = ["A", "C", "D", "E", "F", "G", "H", "I", "K", "L", "M", "N", "P", "Q", "R", "S", "T", "V", "W", "Y"]
for acid in nucleicAcids:
    counts[acid] = 0
total = 0

for line in ecoli.split('\n'):
    if ">" not in line:
        total += len(line)
        for acid in counts.keys():
            counts[acid] += line.count(acid)

相关问题更多 >

编程相关推荐

热门问题

热门文章

Python Dict和Forloop与FASTA fi

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >