根据Pandas数据框进行预测

2024-03-29 11:44:18 发布

您现在位置:Python中文网/ 问答频道 /正文

总体而言,我对数据科学/熊猫非常陌生。我主要遵循this并可以使用不同的分类器使其工作。在

import pandas as pd
import src.helper as helper
import time
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import GradientBoostingClassifier

# Headings
headings = ['class', 'cap-shape', 'cap-surface', 'cap-color', 'bruises', 'odor', 'gill-attachment', 'gill-spacing',
            'gill-size', 'gill-color', 'stalk-shape', 'stalk-root', 'stalk-surface-above-ring',
            'stalk-surface-below-ring', 'stalk-color-above-ring', 'stalk-color-below-ring', 'veil-type',
            'veil-color', 'ring-number', 'ring-type', 'spore-print-color', 'population', 'habitat']

# Load the data
shrooms = pd.read_csv('data/shrooms_no_header.csv', names=headings, converters={"header": float})

# Replace the ? in 'stalk-root' with 0
shrooms.loc[shrooms['stalk-root'] == '?', 'stalk-root'] = np.nan
shrooms.fillna(0, inplace=True)

# Remove columns with only one unique value
for col in shrooms.columns.values:
    if len(shrooms[col].unique()) <= 1:
        print("Removing column {}, which only contains the value: {}".format(col, shrooms[col].unique()[0]))
        shrooms.drop(col, axis=1, inplace=True)

# Col to predict later
col_predict = 'class'

# Binary Encoding
all_cols = list(shrooms.columns.values)
all_cols.remove(col_predict)
helper.encode(shrooms, [col_predict])

# Expand Shrooms DataFrame to Binary Values
helper.expand(shrooms, all_cols)

# Remove the class we want to predict
x_all = list(shrooms.columns.values)
x_all.remove(col_predict)

# Set Train/Test ratio
ratio = 0.7

# Split the DF
df_train, df_test, X_train, Y_train, X_test, Y_test = helper.split_df(shrooms, col_predict, x_all, ratio)

# Try different classifier
# TODO: Batch Use to compare
classifier = GradientBoostingClassifier(n_estimators=1000)

# TODO: Optimize Hyperparamter (where applicable)

# Time the training
timer_start = time.process_time()
classifier.fit(X_train, Y_train)
timer_stop = time.process_time()
time_diff = timer_stop - timer_start

# Get the score
score_train = classifier.score(X_train, Y_train)
score_test = classifier.score(X_test, Y_test)

print('Train Score {}, Test Score {}, Time {}'.format(score_train, score_test, time_diff))

# TODO: Test a manual DataFrame

“助手”是我不太理解的功能,但它们起作用:

^{pr2}$

我想做一个“手动”测试,比如输入x属性并根据它得到一个预测。在

例如,我硬编码了一个数据帧,如下所示:

manual = pd.DataFrame({
    "cap-shape": ["x"],
    "cap-surface": ["s"],
    "cap-color": ["n"],
    "bruises": ["f"],
    "odor": ["n"],
    "gill-attachment": ["a"],
    "gill-spacing": ["c"],
    "gill-size": ["b"],
    "gill-color": ["y"],
    "stalk-shape": ["e"],
    "stalk-root": ["?"],
    "stalk-surface-above-ring": ["s"],
    "stalk-surface-below-ring": ["s"],
    "stalk-color-above-ring": ["o"],
    "stalk-color-below-ring": ["o"],
    "veil-type": ["p"],
    "veil-color": ["o"],
    "ring-number": ["o"],
    "ring-type": ["p"],
    "spore-print-color": ["o"],
    "population": ["c"],
    "habitat": ["l"]
})

如何应用相同的编码?我的代码是helper.encode(manual, [col_predict]),但是ofc手册没有col_predict?在

请记住,我是一个完全的初学者,我在网上搜索了很多,但我无法找到一个合适的源代码/教程,让我测试一套。在

完整的代码可以找到here。在


Tags: thetestimporthelpertimetraincolpredict
1条回答
网友
1楼 · 发布于 2024-03-29 11:44:18

试试这个:

import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import accuracy_score

data = pd.read_csv('agaricus-lepiota.data.txt', header=None) #read data
data.rename(columns={0: 'y'}, inplace = True) #rename predict column (edible or not)

le = LabelEncoder() # encoder to do label encoder

data = data.apply(lambda x: le.fit_transform(x)) #apply LE to all columns

X = data.drop('y', 1) # X without predict column
y = data['y'] #predict column

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

clf = GradientBoostingClassifier()#you can pass arguments

clf.fit(X_train, y_train)

y_pred = clf.predict(X_test) #it is predict for objects in test

print(accuracy_score(y_test, y_pred)) #check accuracy

我想你可以在sklearn网站上了解更多。 这个例子是你想要的吗?在

要检查手动数据:

^{pr2}$

相关问题 更多 >