使用Python替换特定行的字符串

4 投票

5 回答

58021 浏览

提问于 2025-04-15 17:36

我正在写一个Python脚本，目的是把一个文件夹里所有扩展名为.seq的文本文件中的字符串替换掉。要替换的字符串只来自每个文件的第二行，输出结果会放在一个新的子文件夹里（叫做clean），文件名和原来的文件一样，但会加上一个.clean的后缀。输出的文件内容和原文件完全一样，只是字符串被替换了。我需要把这些字符串：'K'、'Y'、'W'、'M'、'R'、'S'都替换成'N'。

这是我在网上查资料后写出来的代码。代码看起来很乱（我编程才第二周），而且它只是在把文件复制到clean目录里，没有进行任何替换。我非常希望能得到一些帮助。

谢谢大家！

import os, shutil

os.mkdir('clean')

for file in os.listdir(os.getcwd()):
    if file.find('.seq') != -1:
        shutil.copy(file, 'clean')

os.chdir('clean')

for subdir, dirs, files in os.walk(os.getcwd()):
    for file in files:
        f = open(file, 'r')
        for line in f.read():
            if line.__contains__('>'): #indicator for the first line. the first line always starts with '>'. It's a FASTA file, if you've worked with dna/protein before.
                pass
            else:
                line.replace('M', 'N')
                line.replace('K', 'N')
                line.replace('Y', 'N')
                line.replace('W', 'N')
                line.replace('R', 'N')
                line.replace('S', 'N')

字符串替换代码调试文件处理文本文件编程学习文件夹操作输出文件扩展名处理

5 个回答

这里有一些一般性的建议：

不要用 find 来检查文件扩展名（比如，这样会匹配到 "file1.seqdata.xls"）。至少应该用 file.endswith('seq')，更好的方法是用 os.path.splitext(file)[1]。
其实，根本不需要这样做。你应该这样：
```
import glob
seq_files = glob.glob("*.seq")
```

不要复制文件，使用一个循环就能搞定，简单多了：

for filename in seq_files:
    in_file = open(filename)
    out_file = open(os.path.join("clean", filename), "w")
    # now read lines from in_file and write lines to out_file

不要用 line.__contains__('>')。你想表达的是
```
if '>' in line:
```
（这会在内部调用 __contains__）。但实际上，你想知道这一行是否是以 `">"` 开头，而不是行中某个地方有没有这个字符。所以更好的方法是：
```
if line.startswith(">"):
```
我对你的文件类型不太了解；如果 ">" 检查只是为了确定第一行，还有更好的方法可以做到这一点。

你不需要 if 语句块（你只是 pass）。这样写会更简洁：

if not something:
    do_things()
other_stuff()

而不是

if something:
    pass
else:
    do_things()
other_stuff()

祝你学习 Python 愉快！

回答于 2025-04-15 由 Python大师

分享举报

你应该把 line.replace('M', 'N') 改成 line=line.replace('M', 'N')。因为 replace 方法会返回一个新的字符串，这个字符串是把你想替换的部分换掉后的结果。

我觉得还有一个更好的方法，就是用 re 模块。

import re

line="ABCDEFGHIJKLMNOPQRSTUVWXYZ"
line=re.sub("K|Y|W|M|R|S",'N',line)
print line

回答于 2025-04-15 由 Python大师

分享举报

一些注意事项：

string.replace 和 re.sub 这两个方法不会直接修改原来的字符串，所以你需要把返回的结果重新赋值给你的变量。
glob.glob 更适合用来查找符合特定模式的文件。
在创建目录之前，最好先检查一下这个目录是否已经存在（我只是这样假设的，这可能不是你想要的行为）。
with 语句可以安全地关闭文件。如果你不想用它，就必须使用 try 和 finally 来确保文件能被关闭。
在你的例子中，你忘记加后缀 *.clean 了；)
你实际上并没有写入文件，你可以像我在例子中那样做，或者使用 fileinput 模块（直到今天我才知道这个模块）。

这是我的例子：

import re
import os
import glob

source_dir=os.getcwd()
target_dir="clean"
source_files = [fname for fname in glob.glob(os.path.join(source_dir,"*.seq"))]

# check if target directory exists... if not, create it.
if not os.path.exists(target_dir):
    os.makedirs(target_dir)

for source_file in source_files:
   target_file = os.path.join(target_dir,os.path.basename(source_file)+".clean")
   with open(source_file,'r') as sfile:
      with open(target_file,'w') as tfile:
         lines = sfile.readlines()
         # do the replacement in the second line.
         # (remember that arrays are zero indexed)
         lines[1]=re.sub("K|Y|W|M|R|S",'N',lines[1])
         tfile.writelines(lines)

print "DONE"

希望这能帮到你。

回答于 2025-04-15 由 Python大师

分享举报

使用Python替换特定行的字符串

5 个回答

撰写回答