减少内存占用的Python解析决策树

2024-04-25 04:36:33 发布

您现在位置:Python中文网/ 问答频道 /正文

我使用Python中的随机林库生成了一个决策树。输出由一个包含决策根和决策叶的行的文件组成。但是,对于我的项目来说,这个文件很大。我认为通过解析它可以大大减少。你知道用什么“自动”(函数)方式解析它吗?那么,我如何使用这个解析的代码对未来的样本进行分类呢?你知道吗

在这个链接中,您可以找到原始的.dot文件(和rtf文件)和以pdf格式绘制的决策树。你知道吗

https://drive.google.com/open?id=0BzKmYuvGbMT7c1pJZGY1TlZKQlU

我添加了一个示例代码

digraph Tree {
node [shape=box] ;
0 [label="X[1443] <= 0.956\ngini = 0.6009\nsamples = 11373\nvalue = [59, 10034, 5043, 1447, 989, 297, 127, 4]"] ;
1 [label="X[688] <= 0.4438\ngini = 0.5472\nsamples = 9977\nvalue = [59, 9752, 3941, 1042, 596, 219, 124, 4]"] ;
0 -> 1 [labeldistance=2.5, labelangle=45, headlabel="True"] ;
2 [label="X[414] <= 0.6147\ngini = 0.366\nsamples = 5569\nvalue = [59, 6798, 1312, 212, 213, 48, 61, 1]"] ;
1 -> 2 ;
3 [label="X[681] <= 0.2921\ngini = 0.2992\nsamples = 5193\nvalue = [59, 6713, 1111, 131, 114, 2, 1, 0]"] ;
2 -> 3 ;
4 [label="X[106] <= 0.5942\ngini = 0.1943\nsamples = 3310\nvalue = [59, 4622, 429, 34, 27, 0, 1, 0]"] ;
3 -> 4 ;
5 [label="X[1676] <= 0.2005\ngini = 0.1624\nsamples = 3112\nvalue = [59, 4447, 344, 12, 12, 0, 0, 0]"] ;
4 -> 5 ;
6 [label="X[2074] <= 0.5058\ngini = 0.4034\nsamples = 411\nvalue = [0, 479, 155, 8, 10, 0, 0, 0]"] ;
5 -> 6 ;
7 [label="X[2674] <= 0.7095\ngini = 0.2498\nsamples = 206\nvalue = [0, 277, 45, 2, 0, 0, 0, 0]"] ;
6 -> 7 ;
8 [label="X[3294] <= 0.1852\ngini = 0.3138\nsamples = 143\nvalue = [0, 189, 43, 2, 0, 0, 0, 0]"] ;
7 -> 8 ;
9 [label="X[2548] <= 0.2647\ngini = 0.305\nsamples = 87\nvalue = [0, 111, 23, 2, 0, 0, 0, 0]"] ;
8 -> 9 ;
10 [label="gini = 0.524\nsamples = 16\nvalue = [0, 15, 11, 1, 0, 0, 0, 0]"] ;
9 -> 10 ;
11 [label="gini = 0.2121\nsamples = 71\nvalue = [0, 96, 12, 1, 0, 0, 0, 0]"] ;
9 -> 11 ;
12 [label="X[1459] <= 0.1388\ngini = 0.3249\nsamples = 56\nvalue = [0, 78, 20, 0, 0, 0, 0, 0]"] ;
8 -> 12 ;
 .....
943 -> 949 ;
950 [label="gini = 0.0\nsamples = 2\nvalue = [0, 0, 0, 0, 3, 0, 0, 0]"] ;
942 -> 950 ;
}

Tags: 文件项目函数代码决策树链接方式分类