让Python脚本自我复制的最佳方法是什么?
我在用Python做科学应用。我会用不同的参数来运行模拟,然后我的脚本会把数据输出到一个合适的文件夹里,方便后续使用。不过,有时候我会修改我的脚本;为了能在需要的时候重现我的结果,我希望能把生成数据时用的脚本版本也保留在数据的文件夹里。简单来说,我想让我的Python脚本能够把自己复制到数据文件夹里。这样做的最好方法是什么呢?
谢谢!
3 个回答
如果你在使用Linux系统,可以试试下面这个方法。
import os
os.system("cp ./scriptname ./")
我偶然看到这个问题,因为我也想做同样的事情。虽然我同意评论里说的,使用 git 或其他版本控制工具来管理修订版本是最干净的解决方案,但有时候你就是想要一个快速简单的办法来完成任务。所以如果还有人感兴趣的话:
你可以用 __file__
来获取正在运行的脚本的文件名(包括路径),就像之前提到的那样,你可以使用一个高级的文件操作库,比如 shutil
,把它复制到某个地方。用一句话来说就是:
shutil.copy(__file__, 'experiment_folder_path/copied_script_name.py')
接下来是相应的导入和一些额外的功能:
import shutil
import os # optional: for extracting basename / creating new filepath
import time # optional: for appending time string to copied script
# generate filename with timestring
copied_script_name = time.strftime("%Y-%m-%d_%H%M") + '_' + os.path.basename(__file__)
# copy script
shutil.copy(__file__, 'my_experiment_folder_path' + os.sep + copied_script_name)
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
# Author: R.F. Smith <rsmith@xs4all.nl>
# $Date: 2013-11-24 22:20:54 +0100 $
#
# To the extent possible under law, Roland Smith has waived all copyright and
# related or neighboring rights to kwclean.py. This work is published from the
# Netherlands. See http://creativecommons.org/publicdomain/zero/1.0/
"""Remove the Date and Revision keyword contents from the standard input."""
import sys
import re
## This is the main program ##
if __name__ == '__main__':
dre = re.compile(''.join([r'\$', r'Date.*\$']))
drep = ''.join(['$', 'Date', '$'])
rre = re.compile(''.join([r'\$', r'Revision.*\$']))
rrep = ''.join(['$', 'Revision', '$'])
for line in sys.stdin:
line = dre.sub(drep, line)
print rre.sub(rrep, line),
复制脚本可以用 shutil.copy()
这个方法。
不过,你应该考虑把你的脚本放在版本控制下。这样可以保留修改历史。
比如,我用 git
来管理我的脚本。在 Python 文件中,我通常会加一个版本字符串,像这样;
__version__ = '$Revision: a42ef58 $'[11:-2]
每次文件被修改时,这个版本字符串会和 git 的短哈希标签一起更新。(这是通过运行一个叫 update-modified-keywords.py
的脚本来实现的,这个脚本是在 git 的 post-commit
钩子中运行的。)
如果你有这样的版本字符串,你可以把它放在输出中,这样你就总能知道是哪个版本生成了这个输出。
编辑:
下面是 update-modified-keywords 脚本的内容;
#!/usr/bin/env python2
# -*- coding: utf-8 -*-
#
# Author: R.F. Smith <rsmith@xs4all.nl>
# $Date: 2013-11-24 22:20:54 +0100 $
# $Revision: 3d4f750 $
#
# To the extent possible under law, Roland Smith has waived all copyright and
# related or neighboring rights to update-modified-keywords.py. This work is
# published from the Netherlands.
# See http://creativecommons.org/publicdomain/zero/1.0/
"""Remove and check out those files that that contain keywords and have
changed since in the last commit in the current working directory."""
from __future__ import print_function, division
import os
import mmap
import sys
import subprocess
def checkfor(args):
"""Make sure that a program necessary for using this script is
available.
Arguments:
args -- string or list of strings of commands. A single string may
not contain spaces.
"""
if isinstance(args, str):
if ' ' in args:
raise ValueError('No spaces in single command allowed.')
args = [args]
try:
with open(os.devnull, 'w') as bb:
subprocess.check_call(args, stdout=bb, stderr=bb)
except subprocess.CalledProcessError:
print("Required program '{}' not found! exiting.".format(args[0]))
sys.exit(1)
def modifiedfiles():
"""Find files that have been modified in the last commit.
:returns: A list of filenames.
"""
fnl = []
try:
args = ['git', 'diff-tree', 'HEAD~1', 'HEAD', '--name-only', '-r',
'--diff-filter=ACMRT']
with open(os.devnull, 'w') as bb:
fnl = subprocess.check_output(args, stderr=bb).splitlines()
# Deal with unmodified repositories
if len(fnl) == 1 and fnl[0] is 'clean':
return []
except subprocess.CalledProcessError as e:
if e.returncode == 128: # new repository
args = ['git', 'ls-files']
with open(os.devnull, 'w') as bb:
fnl = subprocess.check_output(args, stderr=bb).splitlines()
# Only return regular files.
fnl = [i for i in fnl if os.path.isfile(i)]
return fnl
def keywordfiles(fns):
"""Filter those files that have keywords in them
:fns: A list of filenames
:returns: A list for filenames for files that contain keywords.
"""
# These lines are encoded otherwise they would be mangled if this file
# is checked in my git repo!
datekw = 'JERhdGU='.decode('base64')
revkw = 'JFJldmlzaW9u'.decode('base64')
rv = []
for fn in fns:
with open(fn, 'rb') as f:
try:
mm = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
if mm.find(datekw) > -1 or mm.find(revkw) > -1:
rv.append(fn)
mm.close()
except ValueError:
pass
return rv
def main(args):
"""Main program.
:args: command line arguments
"""
# Check if git is available.
checkfor(['git', '--version'])
# Check if .git exists
if not os.access('.git', os.F_OK):
print('No .git directory found!')
sys.exit(1)
print('{}: Updating modified files.'.format(args[0]))
# Get modified files
files = modifiedfiles()
if not files:
print('{}: Nothing to do.'.format(args[0]))
sys.exit(0)
files.sort()
# Find files that have keywords in them
kwfn = keywordfiles(files)
for fn in kwfn:
os.remove(fn)
args = ['git', 'checkout', '-f'] + kwfn
subprocess.call(args)
if __name__ == '__main__':
main(sys.argv)
如果你不想让关键词扩展影响你的 git 历史记录,你可以使用 smudge 和 clean 过滤器。我在我的 ~/.gitconfig
文件中设置了以下内容;
[filter "kw"]
clean = kwclean
smudge = kwset
kwclean 和 kwset 都是 Python 脚本。
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
# Author: R.F. Smith <rsmith@xs4all.nl>
# $Date: 2013-11-24 22:20:54 +0100 $
#
# To the extent possible under law, Roland Smith has waived all copyright and
# related or neighboring rights to kwset.py. This work is published from
# the Netherlands. See http://creativecommons.org/publicdomain/zero/1.0/
"""Fill the Date and Revision keywords from the latest git commit and tag and
subtitutes them in the standard input."""
import os
import sys
import subprocess
import re
def gitdate():
"""Get the date from the latest commit in ISO8601 format.
"""
args = ['git', 'log', '-1', '--date=iso']
dline = [l for l in subprocess.check_output(args).splitlines()
if l.startswith('Date')]
try:
dat = dline[0][5:].strip()
return ''.join(['$', 'Date: ', dat, ' $'])
except IndexError:
raise ValueError('Date not found in git output')
def gitrev():
"""Get the latest tag and use it as the revision number. This presumes the
habit of using numerical tags. Use the short hash if no tag available.
"""
args = ['git', 'describe', '--tags', '--always']
try:
with open(os.devnull, 'w') as bb:
r = subprocess.check_output(args, stderr=bb)[:-1]
except subprocess.CalledProcessError:
return ''.join(['$', 'Revision', '$'])
return ''.join(['$', 'Revision: ', r, ' $'])
def main():
"""Main program.
"""
dre = re.compile(''.join([r'\$', r'Date:?\$']))
rre = re.compile(''.join([r'\$', r'Revision:?\$']))
currp = os.getcwd()
if not os.path.exists(currp+'/.git'):
print >> sys.stderr, 'This directory is not controlled by git!'
sys.exit(1)
date = gitdate()
rev = gitrev()
for line in sys.stdin:
line = dre.sub(date, line)
print rre.sub(rev, line),
if __name__ == '__main__':
main()
这两个脚本都安装在一个目录下(文件名末尾没有扩展名,和可执行文件的习惯一样),这个目录在我的 $PATH
中,并且它们的可执行权限已经设置好。
在我的仓库的 .gitattributes
文件中,我选择了哪些文件需要关键词扩展。比如,对于 Python 文件;
*.py filter=kw