处理BLAST outpu上的冗余信息

2024-04-26 07:55:49 发布

男 | 程序猿一只，喜欢编程写python代码。

目前，我正在处理大量的BLASTn分析，我有一个表格格式的输出（参数-outpmt 6），超过17万行。在该链接中：https://textuploader.com/11krp有一个示例，其中包含我想要的信息片段：查询、主题、主题开始、主题结束和分数（按该顺序）

正如我们所看到的，不同的查询可以在相同或不同的位置与同一主题匹配

在接下来的步骤中，我将使用开始和结束的位置来提取主体的这些区域，但是如果我使用这种类型的信息进行提取，我将恢复大量冗余序列

在我看来，存在4种冗余匹配情况：

1-受试者的相同区域=相同的s_起点和相同的s_终点，分数不同

例如，第29、33、37和43行

2-受试者1几乎相同的区域=s_开始不同，s_结束相等，分数不同

例如，第26行（s_起点=928719）、第18、30、34、38行（s_起点=928718）

3-受试者2的几乎相同区域=s_开始相等，s_结束不同，分数不同

例如，第18、30、34、38行（s_端=929459）和第44行（s_端=929456）

4-案例四，相同区域的不同长度=s_起点和s_终点不同，但涵盖相同主题区域，得分不同

例如，第17行（s_起点=922442，s_终点=923192），第29、33、37、43行（s_起点=922444，s_终点=923190）

所以。。。我对Python有一些经验，并编写了以下脚本：

import csv
# openning file
with open('blast_test.csv') as csv_file:
    subject_dict = {} # dictionary to store original informations
    subject_dict_2 = {} #dictionary to store filtred informations
    csv_reader = csv.reader(csv_file, delimiter=',')
# creating a dictionary with subjects information
    #reading file line by line
    for row in csv_reader:
        print(row)
        #atribuiting each column to one variable, modfying the name of subject
        query,subject_old, subject_new, s_start, s_end, score = row[0],row[1],row[1]+'_'+row[2]+'_'+row[3], row[2], row[3], row[4]
        # inserting subjects in a dictionary
        subject_dict[subject_new] = [subject_old, query, s_start, s_end]
        #
#testing dictionary
for k,v in subject_dict.items():
    print(k,':',v)

making comparisons
for k,v in subject_dict.items():
#    if 

'''                        
# creating an output
with open('blast_test_filtred.csv', mode='w') as csv_file:
    writer = csv.writer(csv_file, delimiter=',')
    for subject in subject_dict:
        writer.writerow([subject, s_start, s_end, score, query)])
'''

我的逻辑是：

1-创建包含所有信息的词典，更改主题名称（只是为了便于我理解输出）

2-使用上述四个案例的标准删除冗余信息

3-将此新信息写入输出文件

为了消除这些冗余信息，我认为在每个区域（开始和结束）的上下游各创建一个10个核苷酸的阈值，然后使用受试者的原始名称（受试者的旧名称）对区域进行比较，并选择得分最高的区域（以恢复所有不同区域的方式）

有人能向我解释一下如何执行上述步骤吗

谢谢

Tags： csv in 信息区域主题 for dictionary with

0条回答

目前没有回答

处理BLAST outpu上的冗余信息

相关问题更多 >

编程相关推荐

热门问题

热门文章

处理BLAST outpu上的冗余信息

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >