使用python替换特定行中的字符串

import os, shutil os.mkdir('clean') for file in os.listdir(os.getcwd()): if file.find('.seq') != -1: shutil.copy(file, 'clean') os.chdir('clean') for subdir, dirs, files in os.walk(os.getcwd()): for file in files: f = open(file, 'r') for line in f.read(): if line.__contains__('>'): #indicator for the first line. the first line always starts with '>'. It's a FASTA file, if you've worked with dna/protein before. pass else: line.replace('M', 'N') line.replace('K', 'N') line.replace('Y', 'N') line.replace('W', 'N') line.replace('R', 'N') line.replace('S', 'N')

3条回答

网友

1楼 · 编辑于 2024-06-07 03:17:25

一些注释：

string.replace和re.sub不在适当的位置，因此应该将返回值重新分配给变量。
glob.glob更适合在目录中查找与定义的模式匹配的文件。。。
也许您应该在创建目录之前检查它是否已经存在（我只是假设，这可能不是您想要的行为）
with语句负责以安全的方式关闭文件。如果您不想使用它，您必须使用tryfinally。
在您的示例中，您忘记将sufix放在*.clean；）
如果不是真正编写文件，可以像我在示例中那样执行，或者使用fileinput模块（直到今天我才知道）

我举个例子：

import re
import os
import glob

source_dir=os.getcwd()
target_dir="clean"
source_files = [fname for fname in glob.glob(os.path.join(source_dir,"*.seq"))]

# check if target directory exists... if not, create it.
if not os.path.exists(target_dir):
    os.makedirs(target_dir)

for source_file in source_files:
   target_file = os.path.join(target_dir,os.path.basename(source_file)+".clean")
   with open(source_file,'r') as sfile:
      with open(target_file,'w') as tfile:
         lines = sfile.readlines()
         # do the replacement in the second line.
         # (remember that arrays are zero indexed)
         lines[1]=re.sub("K|Y|W|M|R|S",'N',lines[1])
         tfile.writelines(lines)

print "DONE"

希望有帮助。

网友

2楼 · 编辑于 2024-06-07 03:17:25

以下是一些一般提示：

不要使用find检查文件扩展名（例如，这也将匹配“file1.seqdata.xls”）。至少使用file.endswith('seq')，或者更好的是，os.path.splitext(file)[1]
实际上，不要完全那样做。这就是你想要的：
```
import glob
seq_files = glob.glob("*.seq")
```

不要复制文件，只使用一个循环更容易：

for filename in seq_files:
    in_file = open(filename)
    out_file = open(os.path.join("clean", filename), "w")
    # now read lines from in_file and write lines to out_file

不要使用line.__contains__('>')。你的意思是
```
if '>' in line:
```
（将在内部调用__contains__）。但实际上，您想知道行是否以“>；”开头，而不是行内某个地方是否有行，无论它是否在开头。所以最好的办法是：
```
if line.startswith(">"):
```
我不熟悉您的文件类型；如果">"检查真的只是为了确定第一行，那么有更好的方法可以做到这一点。

你不需要if块（你只需要pass）。写起来更干净

if not something:
    do_things()
other_stuff()

而不是

if something:
    pass
else:
    do_things()
other_stuff()

学Python玩得开心！

网友
3楼 · 编辑于 2024-06-07 03:17:25

您应该用line=line.replace('M', 'N')替换line.replace('M', 'N')。replace返回替换了相关子字符串的原始字符串的副本。

一个更好的方法（IMO）是使用re。

import re

line="ABCDEFGHIJKLMNOPQRSTUVWXYZ"
line=re.sub("K|Y|W|M|R|S",'N',line)
print line

相关问题更多 >

编程相关推荐

热门问题

热门文章