Jubatus工具包
jubakit的Python项目详细描述
jubakit:jubatus工具包
jubakit是一个python模块,可以轻松访问jubatus特性。 jubakit可以与scikit-learn一起使用,这样您就可以使用强大的功能,如交叉验证和模型评估。 有关详细说明,请参见Jubakit Documentation。
目前Jubakit支持 Classifier, Regression, Anomaly, Recommender, NearestNeighbor, Clustering, Burst, Bandit和 Weight引擎。
安装
pip install jubakit
要求
- Python2.7、3.3、3.4或3.5。
- Jubatus需要安装。
- 虽然不是强制性的,但是installing scikit-learn需要使用一些特性,比如k-fold交叉验证。
快速启动
下面的示例演示如何使用csv数据集执行训练/分类。
fromjubakit.classifierimportClassifier,Schema,Dataset,Configfromjubakit.loader.csvimportCSVLoader# Load a CSV file.loader=CSVLoader('iris.csv')# Define types for each column in the CSV file.schema=Schema({'Species':Schema.LABEL,},Schema.NUMBER)# Get the shuffled dataset.dataset=Dataset(loader,schema).shuffle()# Run the classifier service (`jubaclassifier` process).classifier=Classifier.run(Config())# Train the classifier.for_inclassifier.train(dataset):pass# Classify using the trained classifier.for(idx,label,result)inclassifier.classify(dataset):print("true label: {0}, estimated label: {1}".format(label,result[0][0]))
主题示例
有关工作示例,请参见example目录。
Example | Topics | Requires scikit-learn |
---|---|---|
classifier_csv.py | Handling CSV file and numeric features | |
classifier_shogun.py | Handling CSV file and string features | |
classifier_digits.py | Handling toy dataset (digits) | ✓ |
classifier_libsvm.py | Handling LIBSVM file | ✓ |
classifier_kfold.py | K-fold cross validation and metrics | ✓ |
classifier_parameter.py | Finding best hyper parameter | ✓ |
classifier_hyperopt_tuning.py | Finding best hyper parameter using hyperopt | ✓ |
classifier_bulk.py | Bulk Train-Test Classifier | |
classifier_twitter.py | Handling Twitter Streams | |
classifier_model_extract.py | Extract contents of Classfier model file | |
classifier_sklearn_wrapper.py | Classification using scikit-learn wrapper | ✓ |
classifier_sklearn_grid_search.py | Grid Search example using scikit-learn wrapper | ✓ |
classifier_tensorboard.py | Visualize a training process using TensorBoard | ✓ |
regression_boston.py | Regression with toy dataset (boston) | ✓ |
regression_csv.py | Regression with CSV file | |
regression_sklearn_wrapper.py | Regression using scikit-learn wrapper | ✓ |
anomaly_auc.py | Anomaly detection and metrics | |
recommender_npb.py | Recommend similar items | |
nearest_neighbor_aaai.py | Search neighbor items | |
clustering_2d.py | Clustering 2-dimensional dataset | |
burst_dummy_stream.py | Burst detection with stream data | |
bandit_slot.py | Multi-armed bandit with slot machine example | |
weight_shogun.py | Tracing fv_converter behavior using Weight | |
weight_model_extract.py | Extract contents of Weight model file |
许可证
麻省理工学院许可证