Python napkinxc包_程序模块 - PyPI

napkinXC是一个非常简单和快速的极端多类和多标签分类库。

napkinxc的Python项目详细描述

napkinXC公司

napkinXC是一个非常简单和快速的库，用于极端多类和多标签分类。它允许用最少的资源用几行代码为非常大的数据集训练分类器。在

Python，现在，NAPKIXC在C++和C++中都实现了以下特性：

概率标签树（PLT）和在线概率标签树（OPLT）
分层softmax（HSM）
二进制相关性（BR）
一对一（OVR）
快速在线预测top-k标签或超过给定阈值的标签
用于树构建和其他树构建方法的分层k-means聚类
支持预定义的层次结构
用于基本分类器的LIBLINEAR、SGD和AdaGrad解算器
基于树的高效集成模型
帮助程序从XML Repository下载和加载数据
帮助衡量绩效。在

请注意，这个图书馆仍在开发中，同时也是实验基地。有些实验特性可能没有记录。在

napkinXC是在麻省理工学院授权下发行的。欢迎所有对该项目的贡献！在

路线图

即将推出：

可以使用Python中的任何类型的二进制分类器。在
不同阈值的有效预测。在
改进了Python中的数据集加载。在
来自更多XML数据集的存储库。在

Python快速入门和文档

napkinXC的文档可以在https://napkinxc.readthedocs.io上找到并从该存储库生成。在

Python（3.5+）版本的napkinXC可以从Linux和MacOS上的PyPy存储库轻松安装，它需要现代C++ 17编译器，CMake和Git安装：

pip install napkinxc

或者直接从GitHub存储库中获取最新主版本：

^{pr2}$

最小使用示例：

from napkinxc.datasets import load_dataset
from napkinxc.models import PLT
from napkinxc.measures import precision_at_k

X_train, Y_train = load_dataset("eurlex-4k", "train")
X_test, Y_test = load_dataset("eurlex-4k", "test")
plt = PLT("eurlex-model")
plt.fit(X_train, Y_train)
Y_pred = plt.predict(X_test, top_k=1)
print(precision_at_k(Y_test, Y_pred, k=1))

更多示例可以在python/examples目录下找到。在

可执行文件

napkinXC还可以作为可执行文件来训练和评估模型，并使用libsvm格式的数据进行预测

要生成可执行文件，请使用：

cmake .
make

命令行选项：

Usage: nxc <command> <args>

Commands:
    train                   Train model on given input data
    test                    Test model on given input data
    predict                 Predict for given data
    ofo                     Use online f-measure optimization
    version                 Print napkinXC version
    help                    Print help

Args:
    General:
    -i, --input             Input dataset, required
    -o, --output            Output (model) dir, required
    -m, --model             Model type (default = plt)
                            Models: ovr, br, hsm, plt, oplt, svbopFull, svbopHf, brMips, svbopMips
    --ensemble              Number of models in ensemble (default = 1)
    -t, --threads           Number of threads to use (default = 0)
                            Note: -1 to use #cpus - 1, 0 to use #cpus
    --hash                  Size of features space (default = 0)
                            Note: 0 to disable hashing
    --featuresThreshold     Prune features below given threshold (default = 0.0)
    --seed                  Seed (default = system time)
    --verbose               Verbose level (default = 2)

    Base classifiers:
    --optimizer             Optimizer used for training binary classifiers (default = libliner)
                            Optimizers: liblinear, sgd, adagrad, fobos
    --bias                  Value of the bias features (default = 1)
    --inbalanceLabelsWeighting     Increase the weight of minority labels in base classifiers (default = 1)
    --weightsThreshold      Threshold value for pruning models weights (default = 0.1)

    LIBLINEAR:              (more about LIBLINEAR: https://github.com/cjlin1/liblinear)
    -s, --liblinearSolver   LIBLINEAR solver (default for log loss = L2R_LR_DUAL, for l2 loss = L2R_L2LOSS_SVC_DUAL)
                            Supported solvers: L2R_LR_DUAL, L2R_LR, L1R_LR,
                                               L2R_L2LOSS_SVC_DUAL, L2R_L2LOSS_SVC, L2R_L1LOSS_SVC_DUAL, L1R_L2LOSS_SVC
    -c, --liblinearC        LIBLINEAR cost co-efficient, inverse of regularization strength, must be a positive float,
                            smaller values specify stronger regularization (default = 10.0)
    --eps, --liblinearEps   LIBLINEAR tolerance of termination criterion (default = 0.1)

    SGD/AdaGrad:
    -l, --lr, --eta         Step size (learning rate) for online optimizers (default = 1.0)
    --epochs                Number of training epochs for online optimizers (default = 1)
    --adagradEps            Defines starting step size for AdaGrad (default = 0.001)

    Tree:
    -a, --arity             Arity of tree nodes (default = 2)
    --maxLeaves             Maximum degree of pre-leaf nodes. (default = 100)
    --tree                  File with tree structure
    --treeType              Type of a tree to build if file with structure is not provided
                            tree types: hierarchicalKmeans, huffman, completeKaryInOrder, completeKaryRandom,
                                        balancedInOrder, balancedRandom, onlineComplete

    K-Means tree:
    --kmeansEps             Tolerance of termination criterion of the k-means clustering
                            used in hierarchical k-means tree building procedure (default = 0.001)
    --kmeansBalanced        Use balanced K-Means clustering (default = 1)

    Prediction:
    --topK                  Predict top-k labels (default = 5)
    --threshold             Predict labels with probability above the threshold (default = 0)
    --thresholds            Path to a file with threshold for each label
    --setUtility            Type of set-utility function for prediction using svbopFull, svbopHf, svbopMips models.
                            Set-utility functions: uP, uF1, uAlfa, uAlfaBeta, uDeltaGamma
                            See: https://arxiv.org/abs/1906.08129

    Set-Utility:
    --alpha
    --beta
    --delta
    --gamma

    Test:
    --measures              Evaluate test using set of measures (default = "p@1,r@1,c@1,p@3,r@3,c@3,p@5,r@5,c@5")
                            Measures: acc (accuracy), p (precision), r (recall), c (coverage), hl (hamming loos)
                                      p@k (precision at k), r@k (recall at k), c@k (coverage at k), s (prediction size)

有关详细信息，请参阅文档。在

参考和确认

此库实现了以下论文中的方法：

PLT模型的另一个实现在extremeText库中提供，它实现了NeurIPS paper中描述的方法。在

欢迎加入QQ群-->： 979659372

napkinxc 0.4.2

napkinxc的Python项目详细描述

napkinXC公司

路线图

Python快速入门和文档

可执行文件

参考和确认

推荐PyPI第三方库

certbot-dns-openstack

inqbus.zopeftp

ToscaWidgets

django-kvtags

multilabel-metrics

django-querycount

pyRen

twitter_auto

asvmq

python-module-starter

dap.plugins.compress

mediagoblin-indexedsearch

mondemand

xy-imgflowers

python-webnoti

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

napkinxc 0.4.2

napkinxc的Python项目详细描述

napkinXC公司

路线图

Python快速入门和文档

可执行文件

参考和确认

推荐PyPI第三方库

certbot-dns-openstack

inqbus.zopeftp

ToscaWidgets

django-kvtags

multilabel-metrics

django-querycount

pyRen

twitter_auto

asvmq

python-module-starter

dap.plugins.compress

mediagoblin-indexedsearch

mondemand

xy-imgflowers

python-webnoti

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签