Python：优化树生成器或获取新思路

1 投票

3 回答

617 浏览

数据工程师

提问于 2025-04-15 18:24

我写了一个程序，可以生成随机的数学表达式，然后用一些遗传算法的技巧来选择最优的表达式。

程序中的这一部分负责生成随机表达式，并把它们存储在一种树形结构里。

因为这个过程在程序运行时可能会被调用数十亿次，所以我觉得应该对它进行时间上的优化。

我刚开始学习编程，都是自己在摸索，所以虽然我在网上查了很多资料，

但我还是希望能得到一些建议，因为我感觉自己在孤军奋战。

程序的瓶颈似乎出现在 Node.init() 这个函数上，它占用了总时间的22%，还有 random.choice() 函数，占用了总时间的14%。

import random

def printTreeIndented(data, level=0):
    '''utility to view the tree
    '''
    if data == None:
         return
    printTreeIndented(data.right, level+1)
    print '  '*level + '  '+ str(data.cargo)#+ '  '+ str(data.seq)+ '  '+ str(data.branch)
    printTreeIndented(data.left, level+1)

#These are the global constants used in the Tree.build_nodes() method.
Depth = 5
Ratio = .6 #probability of terminating the current branch.
Atoms = ['1.0','2.0','3.0','4.0','5.0','6.0','7.0','8.0','9.0','x','x','x','x']
#dict of operators. the structure is: operator: number of arguements
Operators = {'+': 2, '-': 2, '*': 2, '/': 2, '**': 2}


class KeySeq:
    '''Iterator to produce sequential 
    integers for keys in Tree.thedict
    '''
    def __init__(self, data = 0):
        self.data = data
    def __iter__(self):
        return self
    def next(self):
        self.data = self.data + 1
        return self.data
KS = KeySeq()

class Node(object):
    '''
    '''
    def __init__(self, cargo, left=None, right=None):
        object.__init__(self)
        self.isRoot = False
        self.cargo = cargo
        self.left  = left
        self.right = right
        self.parent = None
        self.branch = None
        self.seq = 0


class Tree(object):
    def __init__(self):
        self.thedict = {}     #provides access to the nodes for further mutation and
        # crossbreeding.
        #When the Tree is instantiated, it comes filled with data.
        self.data = self.build_nodes()

# Uncomment the following lines to see the data and a crude graphic of the tree.
#        print 'data: '
#        for v in  self.thedict.itervalues():
#            print v.cargo,
#        print
#        print
#        printTreeIndented(self.data)

    def build_nodes (self,  depth = Depth, entry = 1,  pparent = None,
     bbranch = None):
        '''
        '''
        r = float()
        r = random.random()

        #If r > Ratio, it forces a terminal node regardless of
        #the value of depth.
        #If entry = 1, then it's the root node and we don't want
        # a tree with just a value in the root node.

        if (depth <= 0) or ((r > Ratio) and (not (entry))):
            '''
            Add a terminal node.
            '''
            this_atom = (random.choice(Atoms))
            this_atom = str(this_atom)
            this_node = Node(this_atom)
            this_node.parent = pparent
            this_node.branch = bbranch
            this_node.seq = KS.next()
            self.thedict[this_node.seq] = this_node
            return this_node

        else:
            '''
            Add a node that has branches.
            '''
            this_operator = (random.choice(Operators.keys()))

            this_node = Node(this_operator)
            if entry:
                this_node.isRoot = True
            this_node.parent = pparent
            this_node.branch = bbranch
            this_node.seq = KS.next()
            self.thedict[this_node.seq] = this_node
            #branch as many times as 'number of arguements'
            # it's only set up for 2 arguements now.
            for i in range(Operators[this_operator]):
                depth =(depth - 1)
                if i == 0:
                    this_node.left = (self.build_nodes(entry = 0, depth =(depth),
                     pparent = this_node, bbranch = 'left'))
                else:
                    this_node.right = (self.build_nodes(entry = 0, depth =(depth),
                     pparent = this_node, bbranch = 'right'))
            return this_node


def Main():
    for i in range(100000):
        t = Tree()
    return t

if __name__ == '__main__':
    rresult = Main()

性能优化树形结构时间复杂度数学表达式编程学习随机生成遗传算法函数瓶颈

3 个回答

在你的代码中，你可以省去很多大括号，这就是Python的一大优点。例如，在条件语句周围加大括号时，

if (depth <= 0) or ((r > Ratio) and (not (entry))):

你只需要写成

if depth <= 0 or (r > Ratio and not entry):

我觉得还有一些多余的调用，比如说

this_atom = str(this_atom)

(this_atom 本身已经是一个字符串了，创建字符串的开销总是比较大的，所以可以省掉这一行)

或者调用 object 构造函数

object.__init__(self)

这也是不必要的。

至于 Node.__init__ 方法被称为“瓶颈”：我想你大部分时间都得在这里花费，因为在像这样构建树的时候，除了创建新的节点，你几乎没有其他事情可做。

回答于 2025-04-15 由 Python大师

分享举报

你可以用itertools.count来替代KeySeq生成器，它们的功能是一样的，不过itertools.count是用C语言实现的，所以会更快。

我觉得没有办法加快Node构造函数的速度。对于random.choice这个调用，你可以通过把代码直接嵌入到你的程序里来优化——也就是从随机模块的源代码中复制粘贴过来。这样做可以省去一次函数调用，因为在Python中，函数调用的开销是比较大的。

你可以通过使用psyco来加速运行，这是一种即时编译优化工具。不过它只适用于32位的Intel版本Python。另一种选择是使用cython，它可以把类似Python的代码转换成C语言，然后编译成Python的C模块。我说是“类似Python”是因为有些东西是不能转换的，你还可以添加C语言的数据类型注释，这样生成的代码会更高效。

回答于 2025-04-15 由 Python大师

分享举报

下面，我总结了一些比较明显的优化方法，基本上没有对算法进行太多改动。所有的时间测试都是在 Linux x86-64 系统上使用 Python 2.6.4 进行的。

初始时间：8.3秒

简单的优化

jellybean 已经指出了一些问题。仅仅修复这些问题就能稍微提高运行速度。将重复调用 Operators.keys() 的部分改为使用同一个列表，也能节省一些时间。

优化后时间：6.6秒

使用 itertools.count

Dave Kirby 指出，简单使用 itertools.count 也能节省一些时间：

from itertools import count
KS = count()

优化后时间：6.2秒

改进构造函数

因为你在构造函数中并没有设置所有 Node 的属性，所以可以把属性声明移动到类体中：

class Node(object):
    isRoot = False
    left  = None
    right = None
    parent = None
    branch = None
    seq = 0

    def __init__(self, cargo):
        self.cargo = cargo

这并不会改变你对类的理解，因为类体中使用的所有值都是不可变的（False、None、0）。如果你需要其他值，可以先看看关于类属性的这个回答。

优化后时间：5.2秒

使用 namedtuple

在你的代码中，你不再改变表达式树，所以可以使用一个不可变的对象。Node 也没有任何行为，所以使用 namedtuple 是个不错的选择。不过，这有一个影响，因为 parent 成员现在必须去掉。考虑到你可能会引入多个参数的运算符，你还是需要用一个子节点的列表来替代左右，这样又是可变的，允许在所有子节点之前创建父节点。

from collections import namedtuple
Node = namedtuple("Node", ["cargo", "left", "right", "branch", "seq", "isRoot"])
# ...
    def build_nodes (self,  depth = Depth, entry = 1,  pparent = None,
         bbranch = None):
        r = random.random()

        if (depth <= 0) or ((r > Ratio) and (not (entry))):
            this_node = Node(
                random.choice(Atoms), None, None, bbranch, KS.next(), False)
            self.thedict[this_node.seq] = this_node
            return this_node

        else:
            this_operator = random.choice(OpKeys)

            this_node = Node(
              this_operator,
              self.build_nodes(entry = 0, depth = depth - 1,
                               pparent = None, bbranch = 'left'),
              self.build_nodes(entry = 0, depth = depth - 2,
                               pparent = None, bbranch = 'right'),
              bbranch, 
              KS.next(), 
              bool(entry))

            self.thedict[this_node.seq] = this_node    
            return this_node

我保留了操作数循环的原始行为，每次迭代都会减少深度。我不确定这是否是想要的行为，但改变它会增加运行时间，因此无法进行比较。

最终时间：4.1秒

接下来该怎么做

如果你想支持超过两个运算符和/或支持父属性，可以使用类似以下代码的方式：

from collections import namedtuple
Node = namedtuple("Node", ["cargo", "args", "parent", "branch", "seq", "isRoot"])

    def build_nodes (self,  depth = Depth, entry = 1,  pparent = None,
         bbranch = None):
        r = random.random()

        if (depth <= 0) or ((r > Ratio) and (not (entry))):
            this_node = Node(
                random.choice(Atoms), None, pparent, bbranch, KS.next(), False)
            self.thedict[this_node.seq] = this_node
            return this_node

        else:
            this_operator = random.choice(OpKeys)

            this_node = Node(
              this_operator, [], pparent, bbranch,
              KS.next(), bool(entry))
            this_node.args.extend(
              self.build_nodes(entry = 0, depth = depth - (i + 1),
                               pparent = this_node, bbranch = i)
              for i in range(Operators[this_operator]))

            self.thedict[this_node.seq] = this_node    
            return this_node

这段代码也会根据运算符的位置减少深度。

回答于 2025-04-15 由 Python大师

分享举报

Python：优化树生成器或获取新思路

3 个回答

简单的优化

使用 itertools.count

改进构造函数

使用 namedtuple

接下来该怎么做

撰写回答