一个包含不同元素的文件以提取并包含到列表中。 当我尝试填写将包含在词典中的列表时,我需要检索特定信息并将其包含在列表中的确定位置。 任何python方面的帮助都将不胜感激
程序集报告目录示例:
Assembly name: Pav631_1.0
Organism name: Pseudomonas avellanae BPIC 631 (g-proteobacteria)
Infraspecific name: strain=BPIC 631
Taxid: 11547
BioSample: SAMN02471966
BioProject: PRJNA84293
Submitter: University of Toronto Centre for the Analysis of Genome Evolution and Function Date: 2012-10-10
Assembly type: n/a
Release type: major
Assembly level: Scaffold
Genome representation: full
WGS project: AKBS01
Assembly method: CLC
以下是我试过的台词:
report_dict = {}
for root, dirs, reports in os.walk(assembly_report_dir):
for report in reports:
accession = '_'.join(report.strip().split('/')[-1].replace('_assembly_report.txt', '').split('_')[0:2])
path = os.path.join(assembly_report_dir, report) # path = the name of the genbank with the complete path to it
with open(path, 'r') as inputfile:
lines = inputfile.readlines()
description = []
for line in lines:
if line.startswith('Organism name: '):
organism = line.strip().split(': ')[-1].split(' (', 1)[0]
species = ' '.join(organism.split(' ')[0:2])
description.append(species)
elif line.startswith('Infraspecific name: strain='):
strain = line.strip().replace(' ','').split('strain=')[-1]
description.append(strain)
elif line.startswith('Assembly name: '):
assembly = line.strip().split(': ')[-1]
description.(assembly)
report_dict[accession] = description
print report_dict
问题是合并到列表(程序集)的最后一个参数包含在列表的第一个位置,而不是最后一个位置
我得到的结果是:
description = ["assembly", "species, "strain"]
我想要这样的清单:
description = ["species", "strain", "assembly"]
一个非常粗糙和肮脏的方式做…因为你的名单长度是固定的这段代码将工作没有问题
相关问题 更多 >
编程相关推荐