如何遍历文件并替换文本

4 投票

3 回答

6439 浏览

提问于 2025-04-16 11:22

我是一名Python初学者：我该如何遍历一个文件夹里的CSV文件，并替换其中的字符串，比如说：

ww into vv
.. into --

所以，我并不想把包含“ww”的整行替换成“vv”，我只想替换这一行中的那个字符串。我试过类似这样的代码：

#!/Python26/
# -*- coding: utf-8 -*-

import os, sys
for f in os.listdir(path):
    lines = f.readlines()

但是我该怎么继续呢？

文本替换字符串操作文件遍历 CSV处理

3 个回答

可以查看其他回答，了解如何替换字符串。我想补充一些关于遍历文件的信息，这是问题的第一部分。

如果你想要遍历一个文件夹及其所有子文件夹，可以使用 os.walk()。注意，os.listdir() 不会递归，也不会在生成的文件名中包含文件夹的名字。你可以使用 os.path.join() 来形成一个更完整的文件路径。

回答于 2025-04-16 由 Python大师

分享举报

如果你想用相同长度的字符串来替换字符串，可以直接在原文件上进行修改，也就是说只改动需要替换的部分，而不需要重新保存整个文件。

使用正则表达式（regex）来做这件事非常简单。文件是CSV格式并不影响这个方法的使用：

from os import listdir
from os.path import join
import re
pat = re.compile('ww|\.\.')
dicrepl = {'ww':'vv' , '..':'--'}

for filename in listdir(path):
    with open(join(path,filename),'rb+') as f:
        ch = f.read()
        f.seek(0,0)
        pos = 0
        for match in pat.finditer(ch):
            f.seek(match.start()-pos, 1)
            f.write(dicrepl[match.group()])
            pos = match.end()

要进行这样的操作，必须以二进制模式打开文件：这就是模式'rb+'中的'b'。

以'r+'模式打开文件可以让你在文件的任何位置进行读取和写入（如果是以'a'模式打开，那只能在文件末尾写入）。

不过，如果文件太大，以至于ch对象会占用太多内存，就需要进行一些调整。

如果替换的字符串长度和原来的字符串不一样，就几乎必须保存一个新的文件来记录这些修改。（如果替换的字符串总是比被替换的字符串短，那是一种特殊情况，仍然可以在不保存新文件的情况下处理。这在处理大文件时可能会很有用。）

使用f.seek(match.start()-pos, 1)而不是f.seek(match.start(), 0)的好处在于，它可以将指针从位置pos移动到位置match.start()，而不需要每次都从位置0移动到match.start()。

相反，使用f.seek(match.start(), 0)时，指针必须先回到位置0（文件开头），然后再向前移动，计算match.start()的字符数，才能停在正确的位置match.start()。因为seek(... , 0)表示从文件开头开始定位，而seek(... , 1)表示从当前指针位置开始移动。

编辑：

如果你只想替换孤立的'ww'字符串，而不是更长字符串中的'ww'部分，比如'wwwwwww'，那么正则表达式需要这样写：

pat = re.compile('(?<!w)ww(?!w)|(?<!\.)\.\.(?!\.)')

这可以通过replace()来实现，而不需要复杂的字符串操作。

编辑：

我忘记在f.read()之后加f.seek(0,0)指令了。这个指令是必要的，因为在读取过程中，指针会移动到文件的末尾，所以需要把它移回文件开头。

我已经修正了代码，现在它可以正常工作了。

下面是一个代码示例，可以用来跟踪正在处理的内容：

from os import listdir
from os.path import join
import re
pat = re.compile('(?<!w)ww(?!w)|(?<!\.)\.\.(?!\.)')
dicrepl = {'ww':'vv' , '..':'ZZ'}

path = ...................................

with open(path,'rb+') as f:
    print "file has just been opened, file's pointer is at position ",f.tell()
    print '- reading of the file : ch = f.read()'
    ch = f.read()
    print "file has just been read"+\
          "\nfile's pointer is now at position ",f.tell(),' , the end of the file'
    print "- file's pointer is moved back to the beginning of the file : f.seek(0,0)"
    f.seek(0,0)
    print "file's pointer is now again at position ",f.tell()
    pos = 0
    print '\n- process of replacrement is now launched :'
    for match in pat.finditer(ch):
        print
        print 'is at position ',f.tell()
        print 'group ',match.group(),' detected on span ',match.span()
        f.seek(match.start()-pos, 1)
        print 'pointer having been moved on position ',f.tell()
        f.write(dicrepl[match.group()])
        print 'detected group have been replaced with ',dicrepl[match.group()]
        print 'now at position ',f.tell()
        pos = match.end()

回答于 2025-04-16 由 Python大师

分享举报

import os
import csv

for filename in os.listdir(path):
    with open(os.path.join(path, filename), 'r') as f:
        for row in csv.reader(f):
            cells = [ cell.replace('www', 'vvv').replace('..', '--')
                      for cell in row ]
            # now you have a list of cells within one row
            # with all strings modified.

sed -i 's/www/vvv/g' yourPath/*csv
sed -i 's/\.\./,,/g' yourPath/*csv

编辑：你是想学习或练习Python，还是只是想完成某个任务？如果是后者，那就可以使用sed这个程序：

回答于 2025-04-16 由 Python大师

分享举报

如何遍历文件并替换文本

3 个回答

撰写回答