Python ntcir-mias-search包_程序模块 - PyPI

MIAS搜索包实现了赢得NTCIR-11 Math-2主要任务的数学信息检索系统（R_i_ka等人，2014）。

ntcir-mias-search的Python项目详细描述

ntcir mias search–我们的ntcir数学任务搜索引擎

ntcir mias search是一个python 3命令行实用程序，它在 WebMIaS实现了数学信息检索系统 ntcir-11 math-2的主要任务（参见task paper，以及这是system description paper）。

实验上，ntcir mias搜索还根据来自NTCIR Math Density Estimator包的相关概率估计。

用法

安装

可以通过执行以下命令来安装包：

$ pip install ntcir-mias-search

显示用法

可以通过执行以下命令来显示包的使用信息命令：

$ ntcir-mias-search --help
usage: ntcir-mias-search [-h] --dataset DATASET --topics TOPICS --positions
                         POSITIONS --estimates ESTIMATES --webmias-url
                         WEBMIAS_URL
                         [--webmias-index-number WEBMIAS_INDEX_NUMBER]
                         [--num-workers-querying NUM_WORKERS_QUERYING]
                         [--num-workers-merging NUM_WORKERS_MERGING]
                         --output-directory OUTPUT_DIRECTORY

Use topics in the NTCIR-10 Math, NTCIR-11 Math-2, and NTCIR-12 MathIR format
to query the WebMIaS interface of the MIaS Math Information Retrieval system
and to retrieve result document lists.

optional arguments:
  -h, --help            show this help message and exit
  --dataset DATASET     A path to a directory containing a dataset in the
                        NTCIR-11 Math-2, and NTCIR-12 MathIR XHTML5 format.
                        The directory does not need to exist, since the path
                        is only required for extracting data from the file
                        with estimated positions of paragraph identifiers.
  --topics TOPICS       A path to a file containing topics in the NTCIR-10
                        Math, NTCIR-11 Math-2, and NTCIR-12 MathIR format.
  --positions POSITIONS 
                        The path to the file, where the estimated positions of
                        all paragraph identifiers from our dataset were stored
                        by the NTCIR Math Density Estimator package.
  --estimates ESTIMATES 
                        The path to the file, where the density, and
                        probability estimates for our dataset were stored by
                        the NTCIR Math Density Estimator package.
  --webmias-url WEBMIAS_URL
                        The URL at which a WebMIaS Java Servlet has been
                        deployed.
  --webmias-index-number WEBMIAS_INDEX_NUMBER
                        The numeric identifier of the WebMIaS index that
                        corresponds to the dataset. Defaults to 0.
  --num-workers-querying NUM_WORKERS_QUERYING
                        The number of processes that will send queries to
                        WebMIaS. Defaults to 1. Note that querying, reranking,
                        and merging takes place simmultaneously.
  --num-workers-merging NUM_WORKERS_MERGING
                        The number of processes that will rerank results.
                        Defaults to 3. Note that querying, reranking, and
                        merging takes place simmultaneously.
  --output-directory OUTPUT_DIRECTORY
                        The path to the directory, where the output files will
                        be stored.
  --plots PLOTS [PLOTS ...]
                        The path to the files, where the evaluation results
                        will plotted.

查询webmias

以下命令使用64工作者查询本地webmias实例进程：

$ mkdir search_results

$ ntcir-mias-search --num-workers-querying 8 --num-workers-merging 56 \
>     --dataset ntcir-11-12 \
>     --topics NTCIR11-Math2-queries-participants.xml \
>     --judgements NTCIR11_Math-qrels.dat \
>     --estimates estimates.pkl.gz --positions positions.pkl.gz \
>     --webmias-url http://localhost:58080/WebMIaS --webmias-index-number 1 \
>     --plots plot.pdf plot.svg \
>     --output-directory search_results
Reading relevance judgements from NTCIR11_Math-qrels.dat
50 judged topics and 2500 total judgements in NTCIR11_Math-qrels.dat
Reading topics from NTCIR11-Math2-queries-participants.xml
50 topics (NTCIR11-Math-1, NTCIR11-Math-2, ...) contain 55 formulae, and 113 keywords
Establishing connection with a WebMIaS Java Servlet at http://localhost:58080/WebMIaS
Reading paragraph position estimates from positions.pkl.gz
8301578 total paragraph identifiers in positions.pkl.gz
Reading density, and probability estimates from estimates.pkl.gz
Querying WebMIaSIndex(http://localhost:58080/WebMIaS, 1), reranking and merging results
Using 3 strategies to aggregate MIaS scores with probability estimates:
- The best possible score that uses relevance judgements (look for 'best' in filenames)
- The original MIaS score with the probability estimate discarded (look for 'orig' in filenames)
- The worst possible score that uses relevance judgements (look for 'worst' in filenames)
Storing reranked per-query result lists in search_results
Using 4 formats to represent mathematical formulae in queries:
- Content MathML XML language (look for 'CMath' in filenames)
- Combined Presentation and Content MathML XML language (look for 'PCMath' in filenames)
- Presentation MathML XML language (look for 'PMath' in filenames)
- The TeX language by professor Knuth (look for 'TeX' in filenames)
Result list for topic NTCIR11-Math-9 contains only 188 / 1000 results, sampling the dataset
Result list for topic NTCIR11-Math-17 contains only 716 / 1000 results, sampling the dataset
Result list for topic NTCIR11-Math-26 contains only 518 / 1000 results, sampling the dataset
Result list for topic NTCIR11-Math-39 contains only 419 / 1000 results, sampling the dataset
Result list for topic NTCIR11-Math-43 contains only 924 / 1000 results, sampling the dataset
get_results:  100%|███████████████████████████████████████████████| 50/50 [00:26<00:00,  1.88it/s]
rerank_and_merge_results: 200it [01:02,  3.18it/s]
Storing final result lists in mias_search_results
100%|█████████████████████████████████████████████████████████████| 12/12 [00:13<00:00,  3.73it/s]
Evaluation results:
- best, PCMath: 0.5569
- best, PMath: 0.5283
- best, TeX: 0.5076
- best, CMath: 0.4983
- orig, PCMath: 0.4917
- ...
- orig, PMath: 0.4616
- worst, CMath: 0.3080
- worst, TeX: 0.2810
- worst, PMath: 0.1156
- worst, PCMath: 0.1141
Plotting plot.svg
Plotting plot.pdf

$ ls search_results
final_CMath.best.tsv
final_CMath.orig.tsv
final_CMath.worst.tsv
final_PCMath.best.tsv
final_PCMath.orig.tsv
final_PCMath.worst.tsv
final_PMath.best.tsv
final_PMath.orig.tsv
final_PMath.worst.tsv
final_TeX.best.tsv
final_TeX.orig.tsv
final_TeX.worst.tsv
NTCIR11-Math-10_CMath.1.query.txt
NTCIR11-Math-10_CMath.1.response.xml
NTCIR11-Math-10_CMath.1.results.best.tsv
NTCIR11-Math-10_CMath.1.results.orig.tsv
NTCIR11-Math-10_CMath.1.results.worst.tsv
NTCIR11-Math-10_CMath.2.query.txt
NTCIR11-Math-10_CMath.2.response.xml
...

下面的命令使用 64个工作进程：

$ mkdir search_results

$ ntcir-mias-search --num-workers-querying 8 --num-workers-merging 56 \
>     --dataset ntcir-11-12 \
>     --topics NTCIR11-Math2-queries-participants.xml \
>     --judgements NTCIR11_Math-qrels.dat \
>     --estimates estimates.pkl.gz --positions positions.pkl.gz \
>     --webmias-url https://mir.fi.muni.cz/webmias-demo --webmias-index-number 0 \
>     --plots plot.pdf plot.svg \
>     --output-directory search_results
Reading relevance judgements from NTCIR11_Math-qrels.dat
50 judged topics and 2500 total judgements in NTCIR11_Math-qrels.dat
Reading topics from NTCIR11-Math2-queries-participants.xml
50 topics (NTCIR11-Math-1, NTCIR11-Math-2, ...) contain 55 formulae, and 113 keywords
Establishing connection with a WebMIaS Java Servlet at https://mir.fi.muni.cz/webmias-demo
Reading paragraph position estimates from positions.pkl.gz
8301578 total paragraph identifiers in positions.pkl.gz
Reading density, and probability estimates from estimates.pkl.gz
Querying WebMIaSIndex(https://mir.fi.muni.cz/webmias-demo, 0), reranking and merging results
Using 3 strategies to aggregate MIaS scores with probability estimates:
- The best possible score that uses relevance judgements (look for 'best' in filenames)
- The original MIaS score with the probability estimate discarded (look for 'orig' in filenames)
- The worst possible score that uses relevance judgements (look for 'worst' in filenames)
Storing reranked per-query result lists in search_results
Using 4 formats to represent mathematical formulae in queries:
- Content MathML XML language (look for 'CMath' in filenames)
- Combined Presentation and Content MathML XML language (look for 'PCMath' in filenames)
- Presentation MathML XML language (look for 'PMath' in filenames)
- The TeX language by professor Knuth (look for 'TeX' in filenames)
get_results:  100%|███████████████████████████████████████████████| 50/50 [05:29<00:00,  6.58s/it]
rerank_and_merge_results: 200it [06:57,  2.09s/it]
Storing final result lists in mias_search_results
100%|█████████████████████████████████████████████████████████████| 12/12 [00:13<00:00,  3.73it/s]
Evaluation results:
- best, PCMath: 0.5569
- best, PMath: 0.5283
- best, TeX: 0.5076
- best, CMath: 0.4983
- orig, PCMath: 0.4917
- ...
- orig, PMath: 0.4616
- worst, CMath: 0.3080
- worst, TeX: 0.2810
- worst, PMath: 0.1156
- worst, PCMath: 0.1141
Plotting plot.svg
Plotting plot.pdf

$ ls search_results
final_CMath.best.tsv
final_CMath.orig.tsv
final_CMath.worst.tsv
final_PCMath.best.tsv
final_PCMath.orig.tsv
final_PCMath.worst.tsv
final_PMath.best.tsv
final_PMath.orig.tsv
final_PMath.worst.tsv
final_TeX.best.tsv
final_TeX.orig.tsv
final_TeX.worst.tsv
NTCIR11-Math-10_CMath.1.query.txt
NTCIR11-Math-10_CMath.1.response.xml
NTCIR11-Math-10_CMath.1.results.best.tsv
NTCIR11-Math-10_CMath.1.results.orig.tsv
NTCIR11-Math-10_CMath.1.results.worst.tsv
NTCIR11-Math-10_CMath.2.query.txt
NTCIR11-Math-10_CMath.2.response.xml
...

贡献

要熟悉代码库，请参考 Umbrello项目文档project.xmi：

Rendered UML class diagram

引用NTCIR MIAS搜索

文本

R_i_ka、Michal、Petr Sojka和Martin L_什卡。数学索引器和搜索器胡德：胜利战略的历史和发展。在神田北野佐贺，岸田克也。第11届全国学生委员会评价会议记录信息存取技术。东京：国家信息学研究所， 2-1-2 Hitotsubashi，Chiyoda Ku，东京101-8430，日本，2014年。第127-134页，第8页。国际标准书号978-4-86049-065-2。

bibtex

@inproceedings{mir:MIaSNTCIR-11,author="Michal R\r{u}\v{z}i\v{c}ka and Petr Sojka and Michal L{\' i}\v{s}ka",title="{Math Indexer and Searcher under the Hood:               History and Development of a Winning Strategy}",month=Dec,year=2014,address="Tokyo",booktitle="{Proc. of the 11th NTCIR Conference on Evaluation               of Information Access Technologies}",editor="Hideo Joho and Kazuaki Kishida",publisher="{NII, Tokyo, Japan}",pages="127--134",}

欢迎加入QQ群-->： 979659372

ntcir-mias-search 0.2.2

ntcir-mias-search的Python项目详细描述

ntcir mias search–我们的ntcir数学任务搜索引擎

用法

安装

显示用法

查询webmias

贡献

引用NTCIR MIAS搜索

文本

bibtex

推荐PyPI第三方库

tief

slackmon

mpicp

pcal9535a

zoetrope

flaskmarkdown

jiggle

mlpl

stew

edxorganizations

quotes-api

pdstbn

torchsparse

ziti

zheng

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

ntcir-mias-search 0.2.2

ntcir-mias-search的Python项目详细描述

ntcir mias search–我们的ntcir数学任务搜索引擎

用法

安装

显示用法

查询webmias

贡献

引用NTCIR MIAS搜索

文本

bibtex

推荐PyPI第三方库

tief

slackmon

mpicp

pcal9535a

zoetrope

flaskmarkdown

jiggle

mlpl

stew

edxorganizations

quotes-api

pdstbn

torchsparse

ziti

zheng

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签