在文件对象中间插入字符串

0 投票

2 回答

921 浏览

提问于 2025-04-15 15:30

我正在解决一个问题，但遇到了一些困难。

我有一堆（可能很大的）文本文件，我需要对这些文件进行一系列的过滤和转换，然后把处理后的结果导出到其他地方。

大致上，我的代码是这样的：

def apply_filter_transformer(basepath = None, newpath = None, fts= None):
    #because all the raw studies in basepath should not be modified, so I first cp all to newpath
    for i in listdir(basepath):
        file(path.join(newpath, i), "wb").writelines(file(path.join(basepath, i)).readlines())
    for i in listdir(newpath):
        fileobj = open(path.join(newpath, i), "r+")
        for fcn in fts:
            fileobj = fcn(fileobj)
        if fileobj is not None:
            fileobj.writelines(fileobj.readlines())
        try:
            fileobj.close()
        except:
            print i, "at", fcn
            pass
def main():
    apply_filter_transformer(path.join(pardir, pardir, "studies"),
                         path.abspath(path.join(pardir, pardir, "filtered_studies")),
                         [
                        #transformer_addMemo,
                          filter_executable,
                          transformer_identity,
                          filter_identity,
                          ])

在 apply_filter_transformer 这个函数里，fts 是一个函数列表，这些函数接收一个 Python 文件对象，并返回一个 Python 文件对象。我的问题是，当我想把字符串插入到文本对象时，遇到了不太好理解的错误，结果我整整一个上午都卡在这里了。

def transformer_addMemo(fileobj):
    STYLUSMEMO =r"""hellow world"""
    study = fileobj.read()
    location = re.search(r"</BasicOptions>", study)
    print fileobj.name
    print fileobj.mode
    fileobj.seek(0)
    fileobj.write(study[:location.end()] + STYLUSMEMO + study[location.end():])
    return fileobj

这个错误给我带来了：

Traceback (most recent call last):
 File "E:\mypy\reg_test\src\preprocessor\preprocessor.py", line 292, in <module>
  main()
 File "E:\mypy\reg_test\src\preprocessor\preprocessor.py", line 288, in main
 filter_identity,
 File "E:\mypy\reg_test\src\preprocessor\preprocessor.py", line 276, in     apply_filter_transformer
   fileobj.writelines(fileobj.readlines())
   IOError: [Errno 0] Error

如果有人能给我更多关于这个错误的信息，我会非常感激。

数据处理文件处理编程问题字符串插入文件对象错误调试文本过滤转换函数

2 个回答

有一个很方便的Python模块，可以用来修改或读取一组文件，叫做 fileinput。

我不太确定是什么导致了这个错误。不过，你把整个文件都读到内存里，这在你的情况下不是个好主意，因为这些文件可能很大。使用fileinput，你可以很轻松地替换文件。例如：

import fileinput
import sys

for line in fileinput.input(list_of_files, inplace=True):
    sys.stdout.write(line)
    if keyword in line:
         sys.stdout.write(my_text)

回答于 2025-04-15 由 Python大师

分享举报

从你发的代码来看，真的很难判断错误的原因是什么。问题可能出在你用来处理转换函数的协议上。

我来把代码简化一下：

fileobj = file.open(path, mode)
fileobj = fcn(fileobj)
fileobj.writelines(fileobj.readlines())

我怎么能确保 fcn 返回的文件是以我原来文件的方式打开的呢？它真的返回了一个打开的文件吗？它返回的是一个文件吗？其实，我并不能保证这些。

看起来你在处理过程中根本不需要使用文件对象。既然你是把整个文件读入内存，那为什么不让你的转换函数直接处理字符串呢？这样你的代码就可以变成这样：

with open(filename, "r") as f:
    s = f.read()
for transform_function in transforms:
    s = transform_function(s)
with open(filename, "w") as f:
    f.write(s)

这样做的好处之一是，程序中处理文件的部分和数据转换的部分完全分开了，这样一个部分的问题就不会影响到另一个部分。

回答于 2025-04-15 由 Python大师

分享举报

在文件对象中间插入字符串

2 个回答

撰写回答