在fileinput模块中结合就地过滤和编码设置

12 投票

5 回答

3956 浏览

提问于 2025-04-18 16:32

我正在尝试使用 fileinput 模块的就地过滤功能来直接修改一个输入文件。

我需要将编码设置为 latin-1（无论是读取还是写入），并试图将 openhook=fileinput.hook_encoded('latin-1') 传递给 fileinput.input，但遇到了错误

ValueError: FileInput cannot use an opening hook in inplace mode

仔细查看后，我发现 fileinput 的文档明确说明了这一点：你不能同时使用就地和 openhook

那我该怎么解决这个问题呢？

错误处理文档说明编码设置输入文件 fileinput 就地过滤 openhook

5 个回答

我对现有使用 rename 或 remove 的解决方案不是很满意，因为它们把一些文件处理的事情简化得太过了，比如处理文件模式、处理 chmod 属性等等。

在我的情况下，因为我可以控制代码运行的环境，所以我决定唯一合理的解决办法就是把我的区域设置改成使用 UTF8 的区域：

export LC_ALL=en_US.UTF-8

这样做的效果是：

sh-4.2> python3.6 -c "import fileinput;
for line in fileinput.FileInput('DESCRIPTION', inplace=True): print(line.rstrip() + 'hi')
print('done')"
Traceback (most recent call last):
  File "<string>", line 2, in <module>
  File "/usr/lib64/python3.6/fileinput.py", line 250, in __next__
    line = self._readline()
  File "/usr/lib64/python3.6/fileinput.py", line 364, in _readline
    return self._readline()
  File "/usr/lib64/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 227: ordinal not in range(128)'

sh-4.2> export LC_ALL=en_US.UTF-8
sh-4.2> python3.6 -c "import fileinput;
for line in fileinput.FileInput('DESCRIPTION', inplace=True): print(line.rstrip() + 'hi')
print('done')"
done

sh-4.2#

可能的副作用是会影响其他文件的输入和输出，但我在这里并不担心这个。

回答于 2025-04-18 由 Python大师

分享举报

这段内容和其他回答很相似，只不过是用函数的形式写的，这样可以方便地多次调用：

def inplace(orig_path, encoding='latin-1'):
    """Modify a file in-place, with a consistent encoding."""
    new_path = orig_path + '.modified'
    with codecs.open(orig_path, encoding=encoding) as orig:
        with codecs.open(new_path, 'w', encoding=encoding) as new:
            for line in orig:
                yield line, new
    os.rename(new_path, orig_path)

下面是它实际运行的样子：

for line, new in inplace(path):
    line = do_processing(line)  # Use your imagination here.
    new.write(line)

这个代码在python2和python3中都是有效的，只要你指定了正确的编码（在我的情况下，我实际上到处都需要用utf-8，但你的需求可能会有所不同）。

回答于 2025-04-18 由 Python大师

分享举报

如果你不介意使用一个叫做 pip 的库的话，in_place 这个库是支持编码的。

import in_place

with in_place.InPlace(filename, encoding="utf-8") as fp:
  for line in fp:
    fp.write(line)

回答于 2025-04-18 由 Python大师

分享举报

据我所知，使用 fileinput 模块是没有办法绕过这个问题的。不过，你可以通过结合使用 codecs 模块、os.rename() 和 os.remove() 来完成同样的任务：

import os
import codecs

input_name = 'some_file.txt'
tmp_name = 'tmp.txt'

with codecs.open(input_name, 'r', encoding='latin-1') as fi, \
     codecs.open(tmp_name, 'w', encoding='latin-1') as fo:

    for line in fi:
        new_line = do_processing(line) # do your line processing here
        fo.write(new_line)

os.remove(input_name) # remove original
os.rename(tmp_name, input_name) # rename temp to original name

如果你想改变输出文件的编码格式，你可以选择一个新的编码；如果不想改变，可以在打开输出文件时保持为 latin-1。

我知道这不是你想要的原地修改方式，但它能完成你想做的事情，而且非常灵活。

回答于 2025-04-18 由 Python大师

分享举报

从 Python 3.10 开始，fileinput.input() 这个函数可以接受一个编码参数。

回答于 2025-04-18 由 Python大师

分享举报

在fileinput模块中结合就地过滤和编码设置

5 个回答

撰写回答