在Python3类中向列表添加行时出现无限循环

0 投票
2 回答
829 浏览
提问于 2025-04-15 22:41

我有一个脚本,里面有两个类。(我显然删掉了很多我认为和我遇到的错误无关的内容。)最终的任务是创建一个决策树,就像我在这个问题中提到的那样。

不幸的是,我遇到了一个无限循环的问题,我很难找出原因。我已经找到了出问题的那行代码,但我本以为迭代器和我添加到列表中的内容应该是不同的对象。难道列表的 .append 功能有什么我不知道的副作用吗?还是我犯了其他明显的错误?

class Dataset:
    individuals = [] #Becomes a list of dictionaries, in which each dictionary is a row from the CSV with the headers as keys
    def field_set(self): #Returns a list of the fields in individuals[] that can be used to split the data (i.e. have more than one value amongst the individuals
    def classified(self, predicted_value): #Returns True if all the individuals have the same value for predicted_value
    def fields_exhausted(self, predicted_value): #Returns True if all the individuals are identical except for predicted_value
    def lowest_entropy_value(self, predicted_value): #Returns the field that will reduce <a href="http://en.wikipedia.org/wiki/Entropy_%28information_theory%29">entropy</a> the most
    def __init__(self, individuals=[]):

还有

class Node:
    ds = Dataset() #The data that is associated with this Node
    links = [] #List of Nodes, the offspring Nodes of this node
    level = 0 #Tree depth of this Node
    split_value = '' #Field used to split out this Node from the parent node
    node_value = '' #Value used to split out this Node from the parent Node

    def split_dataset(self, split_value): #Splits the dataset into a series of smaller datasets, each of which has a unique value for split_value.  Then creates subnodes to store these datasets.
        fields = [] #List of options for split_value amongst the individuals
        datasets = {} #Dictionary of Datasets, each one with a value from fields[] as its key
        for field in self.ds.field_set()[split_value]: #Populates the keys of fields[]
            fields.append(field)
            datasets[field] = Dataset()
        for i in self.ds.individuals: #Adds individuals to the datasets.dataset that matches their result for split_value
            datasets[i[split_value]].individuals.append(i) #<---Causes an infinite loop on the second hit
        for field in fields: #Creates subnodes from each of the datasets.Dataset options
            self.add_subnode(datasets[field],split_value,field)

    def add_subnode(self, dataset, split_value='', node_value=''):
    def __init__(self, level, dataset=Dataset()):

我现在的初始化代码是:

if __name__ == '__main__':
    filename = (sys.argv[1]) #Takes in a CSV file
    predicted_value = "# class" #Identifies the field from the CSV file that should be predicted
    base_dataset = parse_csv(filename) #Turns the CSV file into a list of lists
    parsed_dataset = individual_list(base_dataset) #Turns the list of lists into a list of dictionaries
    root = Node(0, Dataset(parsed_dataset)) #Creates a root node, passing it the full dataset
    root.split_dataset(root.ds.lowest_entropy_value(predicted_value)) #Performs the first split, creating multiple subnodes
    n = root.links[0] 
    n.split_dataset(n.ds.lowest_entropy_value(predicted_value)) #Attempts to split the first subnode.

2 个回答

4

我怀疑你是在对一个正在遍历的列表进行添加操作,这样会导致这个列表在遍历过程中变大,让遍历的指针无法到达列表的末尾。你可以试着遍历这个列表的一个副本:

for i in list(self.ds.individuals):
    datasets[i[split_value]].individuals.append(i) 
4
class Dataset:
    individuals = []

这有点可疑。除非你想让所有的Dataset实例共享一个静态的成员列表,否则不应该这样做。如果你在__init__里已经设置了self.individuals= something,那么这里就不需要再设置individuals了。

    def __init__(self, individuals=[]):

还是有点可疑。你是在把individuals这个参数赋值给self.individuals吗?如果是这样的话,你就是把在函数定义时创建的同一个individuals列表赋给每一个用默认参数创建的Dataset。如果你在一个Dataset的列表里添加了一个项目,那么所有没有明确指定individuals参数的其他Dataset都会得到这个项目。

类似的情况是:

class Node:
    def __init__(self, level, dataset=Dataset()):

所有没有明确指定dataset参数的Node都会得到完全相同的默认Dataset实例。

这就是所谓的可变默认参数问题,而它可能导致的破坏性迭代很可能是造成你无限循环的原因。

撰写回答