是否可以“破解”Python的打印功能？

3条回答

网友

1楼 · 编辑于 2024-06-06 00:33:06

猴子补丁`print`

print是一个内置函数，因此它将使用在builtins模块（或Python 2中的__builtin__）中定义的print函数。因此，每当您想修改或更改内置函数的行为时，只需在该模块中重新分配名称即可。

这个过程称为monkey-patching。

# Store the real print function in another variable otherwise
# it will be inaccessible after being modified.
_print = print  

# Actual implementation of the new print
def custom_print(*args, **options):
    _print('custom print called')
    _print(*args, **options)

# Change the print function globally
import builtins
builtins.print = custom_print

之后，每个print调用都将通过custom_print，即使print在外部模块中。

但是，您并不真的想打印其他文本，而是想更改打印的文本。一种方法是将其替换为要打印的字符串：

_print = print  

def custom_print(*args, **options):
    # Get the desired seperator or the default whitspace
    sep = options.pop('sep', ' ')
    # Create the final string
    printed_string = sep.join(args)
    # Modify the final string
    printed_string = printed_string.replace('cat', 'dog')
    # Call the default print function
    _print(printed_string, **options)

import builtins
builtins.print = custom_print

如果你真的跑了：

>>> def print_something():
...     print('This cat was scared.')
>>> print_something()
This dog was scared.

或者如果您将其写入文件：

测试文件.py

def print_something():
    print('This cat was scared.')

print_something()

并导入：

>>> import test_file
This dog was scared.
>>> test_file.print_something()
This dog was scared.

所以它真的按预期工作。

但是，如果您只是暂时想对修补程序打印进行修改，可以将其包装在上下文管理器中：

import builtins

class ChangePrint(object):
    def __init__(self):
        self.old_print = print

    def __enter__(self):
        def custom_print(*args, **options):
            # Get the desired seperator or the default whitspace
            sep = options.pop('sep', ' ')
            # Create the final string
            printed_string = sep.join(args)
            # Modify the final string
            printed_string = printed_string.replace('cat', 'dog')
            # Call the default print function
            self.old_print(printed_string, **options)

        builtins.print = custom_print

    def __exit__(self, *args, **kwargs):
        builtins.print = self.old_print

因此，当运行时，它取决于打印的上下文：

>>> with ChangePrint() as x:
...     test_file.print_something()
... 
This dog was scared.
>>> test_file.print_something()
This cat was scared.

所以你就可以通过猴子修补来“黑客”了。

修改目标而不是`print`

如果您查看^{}的签名，您会注意到一个file参数，默认情况下是sys.stdout。注意，这是一个动态默认参数（每次调用print时它都会查找sys.stdout），而不像Python中的普通默认参数。因此，如果您更改sys.stdoutprint将实际打印到不同的目标，甚至更方便，Python还提供了^{}函数（从Python 3.4开始，但是很容易为早期的Python版本创建等效函数）。

缺点是它不适用于不打印到sys.stdout的print语句，并且创建自己的stdout并不是很简单。

import io
import sys

class CustomStdout(object):
    def __init__(self, *args, **kwargs):
        self.current_stdout = sys.stdout

    def write(self, string):
        self.current_stdout.write(string.replace('cat', 'dog'))

不过，这也适用：

>>> import contextlib
>>> with contextlib.redirect_stdout(CustomStdout()):
...     test_file.print_something()
... 
This dog was scared.
>>> test_file.print_something()
This cat was scared.

摘要

abarnet已经提到了其中的一些要点，但我想更详细地探讨这些选项。尤其是如何跨模块修改它（使用builtins/__builtin__）以及如何使更改仅为临时的（使用ContextManager）。

网友

2楼 · 编辑于 2024-06-06 00:33:06

首先，实际上有一种更简单的方法。我们要做的就是改变什么样的指纹，对吧？

_print = print
def print(*args, **kw):
    args = (arg.replace('cat', 'dog') if isinstance(arg, str) else arg
            for arg in args)
    _print(*args, **kw)

或者，类似地，您可以使用monkeypatchsys.stdout，而不是print。

而且，这个想法没什么问题。好吧，当然有很多问题，但是比下面的要少

但如果您确实想修改函数对象的代码常量，我们可以这样做。

如果您真的想真正使用代码对象，那么应该使用类似^{}（完成时）或^{}（在此之前，或者对于较旧的Python版本）的库，而不是手动执行。即使对于这种琐碎的事情，CodeType初始值设定项也是一种痛苦；如果你真的需要做像修复lnotab这样的事情，只有疯子才会手动完成。

另外，不用说，并不是所有的Python实现都使用CPython风格的代码对象。这段代码将在CPython 3.7中工作，可能所有的版本都会返回到至少2.2版本，只做一些小的更改（不是代码黑客的东西，而是像生成器表达式之类的东西），但是它不会在任何版本的IronPython中工作。

import types

def print_function():
    print ("This cat was scared.")

def main():
    # A function object is a wrapper around a code object, with
    # a bit of extra stuff like default values and closure cells.
    # See inspect module docs for more details.
    co = print_function.__code__
    # A code object is a wrapper around a string of bytecode, with a
    # whole bunch of extra stuff, including a list of constants used
    # by that bytecode. Again see inspect module docs. Anyway, inside
    # the bytecode for string (which you can read by typing
    # dis.dis(string) in your REPL), there's going to be an
    # instruction like LOAD_CONST 1 to load the string literal onto
    # the stack to pass to the print function, and that works by just
    # reading co.co_consts[1]. So, that's what we want to change.
    consts = tuple(c.replace("cat", "dog") if isinstance(c, str) else c
                   for c in co.co_consts)
    # Unfortunately, code objects are immutable, so we have to create
    # a new one, copying over everything except for co_consts, which
    # we'll replace. And the initializer has a zillion parameters.
    # Try help(types.CodeType) at the REPL to see the whole list.
    co = types.CodeType(
        co.co_argcount, co.co_kwonlyargcount, co.co_nlocals,
        co.co_stacksize, co.co_flags, co.co_code,
        consts, co.co_names, co.co_varnames, co.co_filename,
        co.co_name, co.co_firstlineno, co.co_lnotab,
        co.co_freevars, co.co_cellvars)
    print_function.__code__ = co
    print_function()

main()

破解代码对象会出什么问题？大多数情况下，只有分段错误，RuntimeError会吞噬整个堆栈，更正常的RuntimeError是可以处理的，或者垃圾值可能只会在您尝试使用它们时引发TypeError或AttributeError。例如，尝试创建一个代码对象，其中只有一个RETURN_VALUE，堆栈上没有任何内容（3.6+，b'S'之前为字节码b'S\0'），或者当字节码中有一个LOAD_CONST 0时，为varnames创建一个空元组，或者将varnames减少1，这样最高的LOAD_FAST实际加载一个freevar/cellvar单元格。为了获得一些真正的乐趣，如果您的lnotab足够错误，那么您的代码将只在调试器中运行时segfault。

使用bytecode或byteplay并不能保护您免受所有这些问题的影响，但它们确实有一些基本的健全性检查，以及一些很好的帮助程序，可以让您做一些事情，比如插入一段代码，让它担心更新所有偏移和标签，这样您就不会出错，等等。（另外，它们使您不必键入荒谬的6行构造函数，也不必调试由此产生的愚蠢的拼写错误。）

现在转到#2。

我提到代码对象是不可变的。当然，const是一个元组，所以我们不能直接改变它。常量元组中的东西是一个字符串，我们也不能直接改变它。这就是为什么我必须构建一个新的字符串来构建一个新的元组来构建一个新的代码对象。

但是如果你能直接改变一个字符串呢？

好吧，在足够深的封面下，一切都只是指向一些C数据的指针，对吧？如果您使用的是CPython，则有a C API to access the objects，和you can use ^{} to access that API from within Python itself, which is such a terrible idea that they put a ^{} right there in the stdlib's ^{} module。：）最重要的技巧是id(x)是指向内存中x的实际指针（作为int）。

不幸的是，字符串的C API不能让我们安全地获得已经冻结的字符串的内部存储。所以小心点，让我们自己找个储藏室。

如果您使用的是CPython 3.4-3.7（对于较旧的版本，这是不同的，谁知道将来会发生什么），那么来自纯ASCII模块的字符串文字将使用紧凑的ASCII格式存储，这意味着结构会提前结束，ASCII字节的缓冲区会立即出现在内存中。如果在字符串中放入非ASCII字符或某些类型的非文本字符串，这将中断（可能是segfault），但您可以读取其他4种访问不同类型字符串的缓冲区的方法。

为了让事情稍微简单一点，我在GitHub上使用了^{}项目。（故意不安装pip，因为您不应该使用它，除非是在本地构建解释器等方面进行试验。）

import ctypes
import internals # https://github.com/abarnert/superhackyinternals/blob/master/internals.py

def print_function():
    print ("This cat was scared.")

def main():
    for c in print_function.__code__.co_consts:
        if isinstance(c, str):
            idx = c.find('cat')
            if idx != -1:
                # Too much to explain here; just guess and learn to
                # love the segfaults...
                p = internals.PyUnicodeObject.from_address(id(c))
                assert p.compact and p.ascii
                addr = id(c) + internals.PyUnicodeObject.utf8_length.offset
                buf = (ctypes.c_int8 * 3).from_address(addr + idx)
                buf[:3] = b'dog'

    print_function()

main()

如果你想玩这些东西，int在封面下比str简单得多。通过将2的值更改为1可以更容易地猜出可以打断什么，对吧？实际上，忘了想象吧，我们就这样做（再次使用superhackyinternals中的类型）：

>>> n = 2
>>> pn = PyLongObject.from_address(id(n))
>>> pn.ob_digit[0]
2
>>> pn.ob_digit[0] = 1
>>> 2
1
>>> n * 3
3
>>> i = 10
>>> while i < 40:
...     i *= 2
...     print(i)
10
10
10

…假设代码框有一个无限长的滚动条。

我在IPython中尝试了同样的方法，第一次尝试在提示下计算2，它进入了某种不可中断的无限循环。假设它在REPL循环中使用数字2，而stock解释器不是？

网友

3楼 · 编辑于 2024-06-06 00:33:06

捕获print函数的所有输出并对其进行处理的一种简单方法是将输出流更改为其他内容，例如文件。

我将使用PHP命名约定（ob_start，ob_get_contents，…）

from functools import partial
output_buffer = None
print_orig = print
def ob_start(fname="print.txt"):
    global print
    global output_buffer
    print = partial(print_orig, file=output_buffer)
    output_buffer = open(fname, 'w')
def ob_end():
    global output_buffer
    close(output_buffer)
    print = print_orig
def ob_get_contents(fname="print.txt"):
    return open(fname, 'r').read()

用法：

print ("Hi John")
ob_start()
print ("Hi John")
ob_end()
print (ob_get_contents().replace("Hi", "Bye"))

将打印

Hi John Bye John

猴子补丁`print`

测试文件.py

修改目标而不是`print`

摘要

相关问题更多 >

编程相关推荐

热门问题

热门文章