在Python中增加cProfiler的深度以报告更多函数吗？

9 投票

3 回答

3531 浏览

提问于 2025-04-16 08:48

我正在尝试分析一个函数的性能，这个函数会调用其他函数。我是这样调用性能分析工具的：

from mymodule import foo
def start():
   # ...
   foo()

import cProfile as profile
profile.run('start()', output_file)
p = pstats.Stats(output_file)
print "name: "
print p.sort_stats('name')
print "all stats: "
p.print_stats()
print "cumulative (top 10): "
p.sort_stats('cumulative').print_stats(10)

但是我发现，分析工具总是显示我的模块中的“foo()”函数占用了所有的时间，而没有把它分解成“foo()”调用的那些子函数的时间，这正是我想要看到的。请问我该如何让分析工具报告这些函数的性能呢？

谢谢。

性能优化函数调用性能分析代码分析 cprofiler

3 个回答

-1

也许你也遇到过类似的问题，所以我在这里描述一下我的情况。我的性能分析代码是这样的：

def foobar():
    import cProfile
    pr = cProfile.Profile()
    pr.enable()
    for event in reader.events():
        baz()
        # and other things

    pr.disable()
    pr.dump_stats('result.prof')

最后的性能分析结果只包含了 events() 的调用。我花了不少时间才意识到我分析的是一个空循环。其实，客户端代码中确实有多次调用 foobar()，但是有意义的分析结果被最后一次空循环的调用覆盖掉了。

回答于 2025-04-16 由 Python大师

分享举报

首先，我想说的是，我没有遇到提问者的问题。在Python 2.7的性能分析工具中，确实可以看到被调用的函数和方法。（Python 3.6的文档看起来是一样的，但我还没有在Python 3上测试过。）我猜测，限制只显示前10个返回结果，并按累计时间排序，可能是因为这些函数都是高层次的函数，调用次数很少，而foo()调用的函数被排到了列表的底部。

我决定用一些大数字来进行测试。以下是我的测试代码：

# file: mymodule.py
import math

def foo(n = 5):
    for i in xrange(1,n):
        baz(i)
        bar(i ** i)

def bar(n):
    for i in xrange(1,n):
        e  = exp200(i)
        print "len e: ", len("{}".format(e))

def exp200(n):
    result = 1
    for i in xrange(200):
        result *= n
    return result

def baz(n):
    print "{}".format(n)

还有包含的文件（和提问者的非常相似）：

# file: test.py

from mymodule import foo

def start():
   # ...
   foo(8)

OUTPUT_FILE = 'test.profile_info'

import pstats
import cProfile as profile

profile.run('start()', OUTPUT_FILE)
p = pstats.Stats(OUTPUT_FILE)
print "name: "
print p.sort_stats('name')
print "all stats: "
p.print_stats()
print "cumulative (top 10): "
p.sort_stats('cumulative').print_stats(10)
print "time (top 10): "
p.sort_stats('time').print_stats(10)

注意最后一行。我添加了一个按time排序的视图，这个时间是指在函数中花费的总时间，不包括调用子函数的时间。我发现这个视图更有用，因为它更倾向于显示那些真正在做工作的函数，这些函数可能需要优化。

这是提问者参考的结果部分（按cumulative排序）：

cumulative (top 10):
Thu Mar 24 21:26:32 2016    test.profile_info

         2620840 function calls in 76.039 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   76.039   76.039 <string>:1(<module>)
        1    0.000    0.000   76.039   76.039 test.py:5(start)
        1    0.000    0.000   76.039   76.039 /Users/jhazen/mymodule.py:4(foo)
        7   10.784    1.541   76.039   10.863 /Users/jhazen/mymodule.py:10(bar)
   873605   49.503    0.000   49.503    0.000 /Users/jhazen/mymodule.py:15(exp200)
   873612   15.634    0.000   15.634    0.000 {method 'format' of 'str' objects}
   873605    0.118    0.000    0.118    0.000 {len}
        7    0.000    0.000    0.000    0.000 /Users/jhazen/mymodule.py:21(baz)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

看看这个显示中前3个函数只被调用了一次。我们来看看按time排序的视图：

time (top 10):
Thu Mar 24 21:26:32 2016    test.profile_info

         2620840 function calls in 76.039 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   873605   49.503    0.000   49.503    0.000 /Users/jhazen/mymodule.py:15(exp200)
   873612   15.634    0.000   15.634    0.000 {method 'format' of 'str' objects}
        7   10.784    1.541   76.039   10.863 /Users/jhazen/mymodule.py:10(bar)
   873605    0.118    0.000    0.118    0.000 {len}
        7    0.000    0.000    0.000    0.000 /Users/jhazen/mymodule.py:21(baz)
        1    0.000    0.000   76.039   76.039 /Users/jhazen/mymodule.py:4(foo)
        1    0.000    0.000   76.039   76.039 test.py:5(start)
        1    0.000    0.000   76.039   76.039 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

现在第一项就有意义了。显然，把一个数提升到200次方通过重复乘法是一种“简单”的策略。我们来替换它：

def exp200(n):
    return n ** 200

然后是结果：

time (top 10):
Thu Mar 24 21:32:18 2016    test.profile_info

         2620840 function calls in 30.646 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   873612   15.722    0.000   15.722    0.000 {method 'format' of 'str' objects}
        7    9.760    1.394   30.646    4.378 /Users/jhazen/mymodule.py:10(bar)
   873605    5.056    0.000    5.056    0.000 /Users/jhazen/mymodule.py:15(exp200)
   873605    0.108    0.000    0.108    0.000 {len}
        7    0.000    0.000    0.000    0.000 /Users/jhazen/mymodule.py:18(baz)
        1    0.000    0.000   30.646   30.646 /Users/jhazen/mymodule.py:4(foo)
        1    0.000    0.000   30.646   30.646 test.py:5(start)
        1    0.000    0.000   30.646   30.646 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

时间上有了很好的改善。现在str.format()是我们最慢的部分。我在bar()中添加了一行来打印数字的长度，因为我第一次尝试（只是计算数字而不做任何事情）被优化掉了，而我尝试避免这种情况（打印数字，数字变得非常大）似乎可能会在输入输出上造成阻塞，所以我妥协打印数字的长度。嘿，那是以10为底的对数。我们试试这个：

def bar(n):
    for i in xrange(1,n):
        e  = exp200(i)
        print "log e: ", math.log10(e)

然后是结果：

time (top 10):
Thu Mar 24 21:40:16 2016    test.profile_info

         1747235 function calls in 11.279 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        7    6.082    0.869   11.279    1.611 /Users/jhazen/mymodule.py:10(bar)
   873605    4.996    0.000    4.996    0.000 /Users/jhazen/mymodule.py:15(exp200)
   873605    0.201    0.000    0.201    0.000 {math.log10}
        7    0.000    0.000    0.000    0.000 /Users/jhazen/mymodule.py:18(baz)
        1    0.000    0.000   11.279   11.279 /Users/jhazen/mymodule.py:4(foo)
        7    0.000    0.000    0.000    0.000 {method 'format' of 'str' objects}
        1    0.000    0.000   11.279   11.279 test.py:5(start)
        1    0.000    0.000   11.279   11.279 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

嗯，即使没有str.format()，在bar()中仍然花费了相当多的时间。我们来去掉那个打印：

def bar(n):
    z = 0
    for i in xrange(1,n):
        e  = exp200(i)
        z += math.log10(e)
    return z

然后是结果：

time (top 10):
Thu Mar 24 21:45:24 2016    test.profile_info

         1747235 function calls in 5.031 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   873605    4.487    0.000    4.487    0.000 /Users/jhazen/mymodule.py:17(exp200)
        7    0.440    0.063    5.031    0.719 /Users/jhazen/mymodule.py:10(bar)
   873605    0.104    0.000    0.104    0.000 {math.log10}
        7    0.000    0.000    0.000    0.000 /Users/jhazen/mymodule.py:20(baz)
        1    0.000    0.000    5.031    5.031 /Users/jhazen/mymodule.py:4(foo)
        7    0.000    0.000    0.000    0.000 {method 'format' of 'str' objects}
        1    0.000    0.000    5.031    5.031 test.py:5(start)
        1    0.000    0.000    5.031    5.031 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

现在看起来真正做工作的函数是最忙的，所以我觉得优化到此为止。

希望这能帮到你！

回答于 2025-04-16 由 Python大师

分享举报

你需要使用 p.print_callees() 来查看方法调用的层次结构。输出的内容很容易理解：在左边的列中，你可以找到你感兴趣的函数，比如 foo()，然后在右边的列中可以看到所有被调用的子函数，以及它们的总时间和累计时间。这些子调用的详细信息也会包含在内等等。

回答于 2025-04-16 由 Python大师

分享举报

在Python中增加cProfiler的深度以报告更多函数吗？

3 个回答

撰写回答