在genom中插入“N”的python代码

chr1 822979 822980 CLL6.08_1_snv 88.2 + chr1 1052781 1052782 CLL6.08_2_snv 388.9 + chr1 1216196 1216197 CLL6.08_3_snv 625 + chr1 5053847 5053848 CLL6.08_4_snv 722.2 + chr1 5735093 5735094 CLL6.08_5_snv 138.9 +

Enter UCSC fasta file of human genome:chr1.fa chr1 Length of the chromosome is: 249250622 No. of N in the chromosome are: 23970000 Here is my mutaiton file data length : 249250622 File cannot be opened, wrong format you forgot something:

2条回答

网友

1楼 · 编辑于 2024-05-16 00:57:57

您可能在查找文件目录和打开文件时遇到一些问题。也就是说，一旦你有了文件数据，你的工作就相对容易了。您需要读入fasta文件，删除头并将其转换为一个列表，然后简单地将变异文件中的索引替换为“N”，然后重新创建fasta。步骤如下：

from collections import defaultdict
chromosome = input("what chromosome are you editing? ")

# have all your file paths in order
mutations = path/to/mutations/file
fasta = path/to/fasta/file
newfile = path/to/new/file

# (1) get the mutations out of the mutations file into a list for each chrom
mutdict = defaultdict(list)
with open(mutations, "r") as f1:
    muts = f1.readlines()  # read all lines into list
    muts = [(x[0], int(x[1])) for x in muts]  # get the two columns you want

# (2) convert these to a dict
for (ch, pos) in muts:
    mutdict[ch].append(pos) 

# (3) get your fasta and convert it to a list
with open(fasta, "r") as f2:
    header = f2.readline()  # the first line is a header like ">chr1"
    bases  = f2.read().replace("\n", "")  # read all the bases and remove "\n"
bases = list(bases)  # turn the string into a list

# (4) now you loop through your mutations and change them to N in the fasta list
for mut in mutdict[chromosome]:
    bases[mut] = "N"

# (5) re-write the Fasta:
new_fasta = header
new_fasta = "\n".join("".join(bases[i:i + 50]) for i in xrange(len(bases)))
with open(newfile, "w") as out:
    out.write(new_fasta)

网友

2楼 · 编辑于 2024-05-16 00:57:57

为了让你的生活更轻松一点，你可以考虑用Biopython来阅读你的fasta并进行转换。在

以下是一些帮助您入门的文档http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc16

这是一些入门代码。在

from Bio import SeqIO
handle = open("example.fasta", "rU")
output_handle = open("output.fasta", "w")
for record in SeqIO.parse(handle, "fasta"):
     print record.seq
handle.close()
output_handle.close()

相关问题更多 >

编程相关推荐

热门问题

热门文章