使用“索引”批量重命名文件/文件夹

Question

批量重命名文件和文件夹是一个常见的问题，但经过一些搜索，我发现没有人问过和我类似的问题。

背景：我们把一些生物样本发送给服务提供商，他们会返回一些独特名称的文件，以及一个文本格式的表格，里面包含了文件名和对应的样本等信息：

head samples.txt
fq_file Sample_ID   Sample_name Library_ID  FC_Number   Track_Lanes_Pos
L2369_Track-3885_R1.fastq.gz    S1746_B_7_t B 7 t   L2369_B_7_t 163 6
L2349_Track-3865_R1.fastq.gz    S1726_A_3_t A 3 t   L2349_A_3_t 163 5
L2354_Track-3870_R1.fastq.gz    S1731_A_GFP_c   A GFP c L2354_A_GFP_c   163 5
L2377_Track-3893_R1.fastq.gz    S1754_B_7_c B 7 c   L2377_B_7_c 163 7
L2362_Track-3878_R1.fastq.gz    S1739_B_GFP_t   B GFP t L2362_B_GFP_t   163 6

L2369_Track-3885_
   accepted_hits.bam      
   deletions.bed   
   junctions.bed         
   logs
   accepted_hits.bam.bai  
   insertions.bed  
   left_kept_reads.info
L2349_Track-3865_
   accepted_hits.bam      
   deletions.bed   
   junctions.bed         
   logs
   accepted_hits.bam.bai  
   insertions.bed  
   left_kept_reads.info

目标：因为这些文件名没有意义且难以理解，我想把以.bam结尾的文件（保留后缀）和对应的样本名称重命名，并以更合适的方式重新排序。最终结果应该像这样：

7_t_B
   7_t_B..bam      
   deletions.bed   
   junctions.bed         
   logs
   7_t_B.bam.bai  
   insertions.bed  
   left_kept_reads.info
3_t_A
   3_t_A.bam      
   deletions.bed   
   junctions.bed         
   logs
   accepted_hits.bam.bai  
   insertions.bed  
   left_kept_reads.info

我用bash和python（我是新手）拼凑了一个解决方案，但感觉有点复杂。我的问题是，是否还有更简单或更优雅的方法我没有想到？解决方案可以用python、bash或R来实现，也可以用awk，因为我正在尝试学习它。作为一个相对初学者，确实会让事情变得复杂。

这是我的解决方案：

一个包装器把所有内容整理到一起，并给出了工作流程的概念：

#! /bin/bash

# select columns of interest and write them to a file - basenames
tail -n +2 samples.txt |  cut -d$'\t' -f1,3 >> BAMfilames.txt 

# call my little python script that creates a new .sh with the renaming commmands
./renameBamFiles.py

# finally do the renaming
./renameBam.sh

# and the folders to
./renameBamFolder.sh

renameBamFiles.py：

#! /usr/bin/env python
import re

# Read in the data sample file and create a bash file that will remane the tophat output 
# the reanaming will be as follows:
# mv L2377_Track-3893_R1_ L2377_Track-3893_R1_SRSF7_cyto_B
# 

# Set the input file name
# (The program must be run from within the directory 
#  that contains this data file)
InFileName = 'BAMfilames.txt'


### Rename BAM files

# Open the input file for reading
InFile = open(InFileName, 'r')


# Open the output file for writing
OutFileName= 'renameBam.sh'

OutFile=open(OutFileName,'a') # You can append instead with 'a'

OutFile.write("#! /bin/bash"+"\n")
OutFile.write(" "+"\n")


# Loop through each line in the file
for Line in InFile:
    ## Remove the line ending characters
    Line=Line.strip('\n')

    ## Separate the line into a list of its tab-delimited components
    ElementList=Line.split('\t')

    # separate the folder string from the experimental name
    fileroot=ElementList[1]
    fileroot=fileroot.split()

    # create variable names using regex
    folderName=re.sub(r'^(.*)(\_)(\w+).*', r'\1\2\3\2', ElementList[0])
    folderName=folderName.strip('\n')
    fileName = "%s_%s_%s" % (fileroot[1], fileroot[2], fileroot[0])

    command= "for file in %s/accepted_hits.*; do mv $file ${file/accepted_hits/%s}; done" % (folderName, fileName)

    print command
    OutFile.write(command+"\n")  


# After the loop is completed, close the files
InFile.close()
OutFile.close()


### Rename folders

# Open the input file for reading
InFile = open(InFileName, 'r')


# Open the output file for writing
OutFileName= 'renameBamFolder.sh'

OutFile=open(OutFileName,'w') 

OutFile.write("#! /bin/bash"+"\n")
OutFile.write(" "+"\n")


# Loop through each line in the file
for Line in InFile:
    ## Remove the line ending characters
    Line=Line.strip('\n')

    ## Separate the line into a list of its tab-delimited components
    ElementList=Line.split('\t')

    # separate the folder string from the experimental name
    fileroot=ElementList[1]
    fileroot=fileroot.split()

    # create variable names using regex
    folderName=re.sub(r'^(.*)(\_)(\w+).*', r'\1\2\3\2', ElementList[0])
    folderName=folderName.strip('\n')
    fileName = "%s_%s_%s" % (fileroot[1], fileroot[2], fileroot[0])

    command= "mv %s %s" % (folderName, fileName)

    print command

    OutFile.write(command+"\n")  


# After the loop is completed, close the files
InFile.close()
OutFile.close()

RenameBam.sh - 由之前的python脚本创建：

#! /bin/bash

for file in L2369_Track-3885_R1_/accepted_hits.*; do mv $file ${file/accepted_hits/7_t_B}; done
for file in L2349_Track-3865_R1_/accepted_hits.*; do mv $file ${file/accepted_hits/3_t_A}; done
for file in L2354_Track-3870_R1_/accepted_hits.*; do mv $file ${file/accepted_hits/GFP_c_A}; done
(..)

Rename renameBamFolder.sh也非常相似：

mv L2369_Track-3885_R1_ 7_t_B
mv L2349_Track-3865_R1_ 3_t_A
mv L2354_Track-3870_R1_ GFP_c_A
mv L2377_Track-3893_R1_ 7_c_B

因为我在学习，我觉得看到不同的解决方法和思考方式会非常有帮助。

文件系统数据处理 bash脚本文件管理 awk 工作流程批量重命名生物样本

使用“索引”批量重命名文件/文件夹

5 个回答

撰写回答