如何将包含文件名和信息的文件分别拆分为多个文件？

PLXNA3 ### <- filename1 Missense/nonsense : 13 mutations # <- header spaces accession codon_change amino_acid_change # <- column names tsv ID73 CAT-TAT His66Tyr # <- line tsv ID63 GAC-AAC Asp127Asn # <- line tsv ID31 GCC-GTC Ala307Val # <- line tsv NEDD4L ### <- filename2 Splicing : 1 mutation # <- header spaces accession splicing_mutation # <- column names tsv ID51 IVS1 as G-A -16229 # <- line tsv Gross deletions : 1 mutation # <- header spaces accession DNA_level description HGVS_(nucleotide) HGVS_(protein) # <- column names tsv ID853 gDNA 4.5 Mb incl. entire gene Not yet available Not yet available # <- line tsv OPHN1 ### <- filename3 Small insertions : 3 mutations # <- header spaces accession insertion HGVS_(nucleotide) # <- column names tsv ID96 TTATGTT(^183)TATtCAAATCCAGG c.549dupT p.(Gln184Serfs*23) # <- line tsv ID25 GTGCT(^310)AAGCAcaG_EI_GTCAGTTCT c.931_932dupCA # <- line tsv

PLXNA3 ### <- filename1 Missense/nonsense : 13 mutations # <- header spaces accession codon_change amino_acid_change # <- column names tsv ID73 CAT-TAT His66Tyr # <- line tsv ID63 GAC-AAC Asp127Asn # <- line tsv ID31 GCC-GTC Ala307Val # <- line tsv

NEDD4L ### <- filename2 Splicing : 1 mutation # <- header spaces accession splicing_mutation # <- column names tsv ID51 IVS1 as G-A -16229 # <- line tsv Gross deletions : 1 mutation # <- header spaces accession DNA_level description HGVS_(nucleotide) HGVS_(protein) # <- column names tsv ID853 gDNA 4.5 Mb incl. entire gene Not yet available Not yet available # <- line tsv

OPHN1 ### <- filename3 Small insertions : 3 mutations # <- header spaces accession insertion HGVS_(nucleotide) # <- column names tsv ID96 TTATGTT(^183)TATtCAAATCCAGG c.549dupT p.(Gln184Serfs*23) # <- line tsv ID25 GTGCT(^310)AAGCAcaG_EI_GTCAGTTCT c.931_932dupCA # <- line tsv

2条回答

网友

1楼 · 编辑于 2024-06-16 11:16:57

awk 'NF==1{filename=$0 ".txt"};{print > filename}' file.txt

一个同等但更高傲的选择是

awk 'NF==1{f=$0".txt"}{print>f}' file.txt

网友

2楼 · 编辑于 2024-06-16 11:16:57

这是我想出的解决办法。它首先打开要拆分的文件。然后读取第一行，这是第一个文件的文件名。现在让我跳过while循环。它将打开一个新文件，文件名为刚才读入的文件名（需要strip（）来删除行尾的新行字符）。然后读入行并将其写入新文件，直到出现一个没有空间或制表符的文件为止。然后重复这个过程，直到文件没有更多的行可读（我之前跳过的while循环）

希望有帮助：）

file = open("file.txt", "r")

new_filename = file.readline()
while new_filename:
   with open(new_filename.strip() + ".txt", "w") as new_file:
      new_file.write(new_filename)
      line = file.readline()
      while " " in line or "\t" in line:
         # still the same new file
         new_file.write(line)
         line = file.readline()
   # file ended so read in line was the filename of the next file
   new_filename = line

file.close()

相关问题更多 >

编程相关推荐

热门问题

热门文章