Python中的多输出多类机器学习

2024-05-15 16:36:30 发布

您现在位置:Python中文网/ 问答频道 /正文

我一直在研究,并努力找到最好的方法来解决这个问题,我有。我有一个训练数据集和一个测试数据集。测试数据集缺少训练数据集具有的两个特征列(通道和扇区-都由4个类组成)。你知道吗

我已经在数据上建立了一个决策树,但是我只能在任何一个通道或扇区上进行训练,而我需要在这两个通道或扇区上都进行训练。你知道吗

有人能给我一个用python实现多类多输出机器学习的建议吗?你知道吗

import os
import subprocess

import pandas as pd
import numpy as np
from sklearn.tree import DecisionTreeClassifier, export_graphviz

def getPath(thisFile):
    if os.path.exists(thisFile):
        df = pd.read_csv(thisFile, header=0)
    else:
        return
    return df

def visualize_tree(tree, feature_names):

    with open("dt.dot", 'w') as f:
        export_graphviz(tree, out_file=f,
                        feature_names=feature_names)

    command = ["dot", "-Tpng", "dt.dot", "-o", "dt.png"]
    try:
        subprocess.check_call(command)
    except:
        exit("Could not run dot, ie graphviz, to "
             "produce visualization")

data = np.loadtxt("newTrain2.csv", delimiter=',')
X = data[:, 1:4]
quantity = data[:, 2]
for i in range(len(quantity)):
    if quantity[i] < 30:
        quantity[i] = 1
    if quantity[i] >= 25 and quantity[i] < 75:
        quantity[i] = 2
    if quantity[i] >= 75 and quantity[i] < 250:
        quantity[i] = 3
    if quantity[i] > 250:
        quantity[i] = 4
revenue = data[:, 3]
for i in range(len(revenue)):
    if revenue[i] < 1000:
        revenue[i] = 1
    if revenue[i] >= 1000 and revenue[i] < 4000:
        revenue[i] = 2
    if revenue[i] >= 4000 and revenue[i] < 10000:
        revenue[i] = 3
    if revenue[i] > 10000:
        revenue[i] = 4
X[:, 1] = quantity
X[:, 2] = revenue



targets = data[:,4]

thisTree = DecisionTreeClassifier(min_samples_split=30, random_state=99)
thisTree.fit(X, targets)
visualize_tree(thisTree, ["product", "quantity", "revenue"])

Tags: and数据importtreedataifnamesas