Python ntcir-math-densit包_程序模块 - PyPI

ntcir math density estimator包使用ntcir-11 math-2和ntcir-12 mathir xhtml5格式的数据集和判断来计算密度和概率估计。

ntcir-math-densit的Python项目详细描述

ntcir math density estimator–根据ntcir math tasks的数据估计文档的相关性

ntcir math density estimator是一个python 3命令行实用程序，它使用数据集和NTCIR-11 Math-2中的判断，以及 NTCIR-12 MathIR用于计算密度的xhtml5格式，以及概率估计。最重要的是，这个包估计了 P(relevant | position)，其中position是段落在文件。

用法

安装

可以通过执行以下命令来安装包：

$ pip install ntcir-math-density

显示用法

可以通过执行以下命令来显示包的使用信息命令：

$ ntcir-math-density --help
usage: ntcir-math-density [-h] [--datasets DATASETS [DATASETS ...]]
                          [--judgements JUDGEMENTS [JUDGEMENTS ...]]
                          [--plots PLOTS [PLOTS ...]] [--positions POSITIONS]
                          [--estimates ESTIMATES] [--num-workers NUM_WORKERS]

Use datasets, and judgements in NTCIR-11 Math-2, and NTCIR-12 MathIR XHTML5
format to compute density, and probability estimates.

optional arguments:
-h, --help            show this help message and exit
--datasets DATASETS [DATASETS ...]
                        Paths to the directories containing the datasets. Each
                        path must be prefixed with a unique single-letter
                        label followed by an equals sign (e.g. "A=/some/path").
--judgements JUDGEMENTS [JUDGEMENTS ...]
                        Paths to the files containing relevance judgements.
                        Each path must be prefixed with a single-letter label
                        corresponding to the judged dataset followed by a
                        semicolon (e.g. "A:/some/path/judgement.dat").
--plots PLOTS [PLOTS ...]
                        The path to the files, where the probability
                        estimates will plotted. When no datasets are
                        specified, the estimates file will be loaded.
--positions POSITIONS
                        The path to the file, where the estimated positions of
                        all paragraph identifiers from all datasets will be
                        stored. Defaults to positions.pkl.gz.
--estimates ESTIMATES
                        The path to the file, where the density, and
                        probability estimates will be stored. When no
                        datasets are specified, this file will be loaded to
                        provide the estimates for plotting. Defaults to
                        estimates.pkl.gz.
--num-workers NUM_WORKERS
                        The number of processes that will be used for
                        processing the datasets, and for computing the
                        density, and probability estimates. Defaults to 1.

提取估计值

下面的命令提取密度，概率估计并绘制估计使用64个工作进程：

$ ntcir-math-density --num-workers 64 \
>     --datasets A=ntcir-10-converted B=ntcir-11-12 \
>     --judgements A:NTCIR_10_Math-qrels_fs-converted.dat A:NTCIR_10_Math-qrels_ft-converted.dat \
>                  B:NTCIR11_Math-qrels.dat B:NTCIR12_Math-qrels_agg.dat \
>                  B:NTCIR12_Math_simto-qrels_agg.dat \
>     --estimates estimates.pkl.gz --positions positions.pkl.gz \
>     --plots plot.pdf plot.svg
Retrieving judged paragraph identifiers, and scores from NTCIR_10_Math-qrels_fs-converted.dat
100%|█████████████████████████████████████████████████████| 2129/2129 [00:00<00:00, 334959.05it/s]
Retrieving judged paragraph identifiers, and scores from NTCIR_10_Math-qrels_ft-converted.dat
100%|█████████████████████████████████████████████████████| 1425/1425 [00:00<00:00, 353201.94it/s]
Retrieving judged paragraph identifiers, and scores from NTCIR11_Math-qrels.dat
100%|█████████████████████████████████████████████████████| 2500/2500 [00:00<00:00, 343345.12it/s]
Retrieving judged paragraph identifiers, and scores from NTCIR12_Math-qrels_agg.dat
100%|█████████████████████████████████████████████████████| 4251/4251 [00:00<00:00, 342252.50it/s]
Retrieving judged paragraph identifiers, and scores from NTCIR12_Math_simto-qrels_agg.dat
100%|█████████████████████████████████████████████████████| 654/654 [00:00<00:00, 314428.57it/s]
Retrieving all paragraph identifiers, and positions from ntcir-10-converted
get_all_identifiers(ntcir-10-converted): 5405167it [04:30, 19946.57it/s]
get_all_positions(ntcir-10-converted): 100%|█████████| 5405167/5405167 [08:44<00:00, 10306.72it/s]
Retrieving all paragraph identifiers, and positions from ntcir-11-12
get_all_identifiers(ntcir-11-12): 8301578it [08:08, 16985.19it/s]
get_all_positions(ntcir-11-12): 100%|█████████████████| 8301578/8301578 [44:30<00:00, 3108.88it/s]
1043 / 3146 / 5405167 relevant / judged / total identifiers in dataset ntcir-10-converted
1742 / 7059 / 8301578 relevant / judged / total identifiers in dataset ntcir-11-12
Pickling positions.pkl.gz
Fitting density, and probability estimators
Fitting prior p(position) density estimator
Fitting conditional p(position | relevant) density estimator
Computing density, and probability estimates
p(position): 100%|████████████████████████████████████████████████| 64/64 [01:19<00:00,  1.24s/it]
p(position | relevant): 100%|█████████████████████████████████████| 64/64 [01:19<00:00,  1.24s/it]
Pickling estimates.pkl.gz
Plotting plot.svg
Plotting plot.pdf

以下命令使用64个工作进程提取密度和概率估计：

$ ntcir-math-density --num-workers 64 \
>     --datasets A=ntcir-10-converted B=ntcir-11-12 \
>     --judgements A:NTCIR_10_Math-qrels_fs-converted.dat A:NTCIR_10_Math-qrels_ft-converted.dat \
>                  B:NTCIR11_Math-qrels.dat B:NTCIR12_Math-qrels_agg.dat \
>                  B:NTCIR12_Math_simto-qrels_agg.dat \
>     --estimates estimates.pkl.gz --positions positions.pkl.gz
Retrieving judged paragraph identifiers, and scores from NTCIR_10_Math-qrels_fs-converted.dat
100%|█████████████████████████████████████████████████████| 2129/2129 [00:00<00:00, 334959.05it/s]
Retrieving judged paragraph identifiers, and scores from NTCIR_10_Math-qrels_ft-converted.dat
100%|█████████████████████████████████████████████████████| 1425/1425 [00:00<00:00, 353201.94it/s]
Retrieving judged paragraph identifiers, and scores from NTCIR11_Math-qrels.dat
100%|█████████████████████████████████████████████████████| 2500/2500 [00:00<00:00, 343345.12it/s]
Retrieving judged paragraph identifiers, and scores from NTCIR12_Math-qrels_agg.dat
100%|█████████████████████████████████████████████████████| 4251/4251 [00:00<00:00, 342252.50it/s]
Retrieving judged paragraph identifiers, and scores from NTCIR12_Math_simto-qrels_agg.dat
100%|█████████████████████████████████████████████████████| 654/654 [00:00<00:00, 314428.57it/s]
Retrieving all paragraph identifiers, and positions from ntcir-10-converted
get_all_identifiers(ntcir-10-converted): 5405167it [04:30, 19946.57it/s]
get_all_positions(ntcir-10-converted): 100%|█████████| 5405167/5405167 [08:44<00:00, 10306.72it/s]
Retrieving all paragraph identifiers, and positions from ntcir-11-12
get_all_identifiers(ntcir-11-12): 8301578it [08:08, 16985.19it/s]
get_all_positions(ntcir-11-12): 100%|█████████████████| 8301578/8301578 [44:30<00:00, 3108.88it/s]
1043 / 3146 / 5405167 relevant / judged / total identifiers in dataset ntcir-10-converted
1742 / 7059 / 8301578 relevant / judged / total identifiers in dataset ntcir-11-12
Pickling positions.pkl.gz
Fitting density, and probability estimators
Fitting prior p(position) density estimator
Fitting conditional p(position | relevant) density estimator
Computing density, and probability estimates
p(position): 100%|████████████████████████████████████████████████| 64/64 [01:19<00:00,  1.24s/it]
p(position | relevant): 100%|█████████████████████████████████████| 64/64 [01:19<00:00,  1.24s/it]
Pickling estimates.pkl.gz

以下命令使用64个工作进程绘制估计值：

$ ntcir-math-density --num-workers 64 \
>     --estimates estimates.pkl.gz --plots plot.pdf plot.svg
Unpickling estimates.pkl.gz
Plotting plot.svg
Plotting plot.pdf

欢迎加入QQ群-->： 979659372

ntcir-math-density 0.2.1

ntcir-math-densit的Python项目详细描述

ntcir math density estimator–根据ntcir math tasks的数据估计文档的相关性

用法

安装

显示用法

提取估计值

推荐PyPI第三方库

deep-linear-network

userutil

sermos

job-alert

adw

distributions-adeola

funniest

pyfp

pymydao

gen-name-szczep

nipunn-topsis

MCServerInterface

xlsx2pdf

bankaya-data-utilities

debug-worldaaa

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

ntcir-math-density 0.2.1

ntcir-math-densit的Python项目详细描述

ntcir math density estimator–根据ntcir math tasks的数据估计文档的相关性

用法

安装

显示用法

提取估计值

推荐PyPI第三方库

deep-linear-network

userutil

sermos

job-alert

adw

distributions-adeola

funniest

pyfp

pymydao

gen-name-szczep

nipunn-topsis

MCServerInterface

xlsx2pdf

bankaya-data-utilities

debug-worldaaa

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签