如何用Python进行类似sed的文本替换?
我想在这个文件中启用所有的apt软件源
cat /etc/apt/sources.list
## Note, this file is written by cloud-init on first boot of an instance
## modifications made here will not survive a re-bundle.
## if you wish to make changes you can:
## a.) add 'apt_preserve_sources_list: true' to /etc/cloud/cloud.cfg
## or do the same in user-data
## b.) add sources in /etc/apt/sources.list.d
#
# See http://help.ubuntu.com/community/UpgradeNotes for how to upgrade to
# newer versions of the distribution.
deb http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick main
deb-src http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick main
## Major bug fix updates produced after the final release of the
## distribution.
deb http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-updates main
deb-src http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-updates main
## N.B. software from this repository is ENTIRELY UNSUPPORTED by the Ubuntu
## team. Also, please note that software in universe WILL NOT receive any
## review or updates from the Ubuntu security team.
deb http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick universe
deb-src http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick universe
deb http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-updates universe
deb-src http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-updates universe
## N.B. software from this repository is ENTIRELY UNSUPPORTED by the Ubuntu
## team, and may not be under a free licence. Please satisfy yourself as to
## your rights to use the software. Also, please note that software in
## multiverse WILL NOT receive any review or updates from the Ubuntu
## security team.
# deb http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick multiverse
# deb-src http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick multiverse
# deb http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-updates multiverse
# deb-src http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-updates multiverse
## Uncomment the following two lines to add software from the 'backports'
## repository.
## N.B. software from this repository may not have been tested as
## extensively as that contained in the main release, although it includes
## newer versions of some applications which may provide useful features.
## Also, please note that software in backports WILL NOT receive any review
## or updates from the Ubuntu security team.
# deb http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-backports main restricted universe multiverse
# deb-src http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-backports main restricted universe multiverse
## Uncomment the following two lines to add software from Canonical's
## 'partner' repository.
## This software is not part of Ubuntu, but is offered by Canonical and the
## respective vendors as a service to Ubuntu users.
# deb http://archive.canonical.com/ubuntu maverick partner
# deb-src http://archive.canonical.com/ubuntu maverick partner
deb http://security.ubuntu.com/ubuntu maverick-security main
deb-src http://security.ubuntu.com/ubuntu maverick-security main
deb http://security.ubuntu.com/ubuntu maverick-security universe
deb-src http://security.ubuntu.com/ubuntu maverick-security universe
# deb http://security.ubuntu.com/ubuntu maverick-security multiverse
# deb-src http://security.ubuntu.com/ubuntu maverick-security multiverse
用sed命令可以很简单地做到这一点,命令是 sed -i 's/^# deb/deb/' /etc/apt/sources.list
,那么用Python有什么更优雅的方法来实现这个呢?
14 个回答
自己用纯Python写一个替代sed
的工具,这个想法听起来很不错,但其实里面有很多坑,谁能想到呢?
不过,这个想法是可行的。 而且很有必要。我们都经历过这样的情况:“我需要处理一些纯文本文件,但我只有Python、两根塑料鞋带和一罐发霉的樱桃。求助。”
在这个回答中,我们提供了一个最佳解决方案,结合了之前答案的优点,去掉了那些不太好的部分。正如plundra所提到的,David Miller的很不错的答案在写文件时不是原子操作,这就可能导致竞争条件(比如其他线程或进程同时试图读取这个文件)。这可不好。plundra的很优秀的答案解决了这个问题,但又引入了更多问题,包括很多致命的编码错误、一个严重的安全漏洞(没有保留原文件的权限和其他元数据),以及过早优化,用低级字符索引替代正则表达式。这也不好。
让我们一起追求更好的解决方案吧!
import re, shutil, tempfile
def sed_inplace(filename, pattern, repl):
'''
Perform the pure-Python equivalent of in-place `sed` substitution: e.g.,
`sed -i -e 's/'${pattern}'/'${repl}' "${filename}"`.
'''
# For efficiency, precompile the passed regular expression.
pattern_compiled = re.compile(pattern)
# For portability, NamedTemporaryFile() defaults to mode "w+b" (i.e., binary
# writing with updating). This is usually a good thing. In this case,
# however, binary writing imposes non-trivial encoding constraints trivially
# resolved by switching to text writing. Let's do that.
with tempfile.NamedTemporaryFile(mode='w', delete=False) as tmp_file:
with open(filename) as src_file:
for line in src_file:
tmp_file.write(pattern_compiled.sub(repl, line))
# Overwrite the original file with the munged temporary file in a
# manner preserving file attributes (e.g., permissions).
shutil.copystat(filename, tmp_file.name)
shutil.move(tmp_file.name, filename)
# Do it for Johnny.
sed_inplace('/etc/apt/sources.list', r'^\# deb', 'deb')
你可以这样做:
with open("/etc/apt/sources.list", "r") as sources:
lines = sources.readlines()
with open("/etc/apt/sources.list", "w") as sources:
for line in lines:
sources.write(re.sub(r'^# deb', 'deb', line))
使用这个“with”语句可以确保文件在使用完后正确关闭。而且,当你用 "w"
模式重新打开文件时,它会在你写入之前把文件清空。re.sub(pattern, replace, string) 的作用和 sed/perl 中的 s/pattern/replace/ 是一样的。
编辑:修正了示例中的语法错误
massedit.py(可以在这里找到:http://github.com/elmotec/massedit)可以帮你搭建基础,只需要你写正则表达式。这个工具还在测试阶段,我们希望能听到大家的反馈。
python -m massedit -e "re.sub(r'^# deb', 'deb', line)" /etc/apt/sources.list
这个工具会以差异格式显示修改前后的内容。
如果你想把更改写入原文件,可以加上 -w 这个选项:
python -m massedit -e "re.sub(r'^# deb', 'deb', line)" -w /etc/apt/sources.list
另外,你现在也可以使用 API:
>>> import massedit
>>> filenames = ['/etc/apt/sources.list']
>>> massedit.edit_files(filenames, ["re.sub(r'^# deb', 'deb', line)"], dry_run=True)