点态希尔伯特-施密特独立准则(PHSIC)

phsic-cli的Python项目详细描述


点态hilbert窶鉄chmidt独立准则(phsic)

利用相似性计算两个对象之间的共现

例如,给定一致的句子对:

XY
They had breakfast at the hotel.They are full now.
They had breakfast at ten.I'm full.
She had breakfast with her friends.She felt happy.
They had breakfast with their friends at the Japanese restaurant.They felt happy.
He have trouble with his homework.He cries.
I have trouble associating with others.I cry.

PHSIC可以根据给定的配对给一致的配对以高分:

XYscore
They had breakfast at the hotel.They are full now.0.1134
They had breakfast at an Italian restaurant.They are stuffed now.0.0023
I have dinner.I have dinner again.0.0023

安装

$ pip install phsic

这将在您的环境中安装phsic命令:

$ phsic --help

基本用法

下载预先训练过的WordVecs(如FastText):

$ wget https://s3-us-west-1.amazonaws.com/fasttext-vectors/crawl-300d-2M.vec.zip
$ unzip crawl-300d-2M.vec.zip

准备数据集:

$ TAB="$(printf '\t')"
$ cat << EOF > train.txt
They had breakfast at the hotel.${TAB}They are full now.
They had breakfast at ten.${TAB}I'm full.
She had breakfast with her friends.${TAB}She felt happy.
They had breakfast with their friends at the Japanese restaurant.${TAB}They felt happy.
He have trouble with his homework.${TAB}He cries.
I have trouble associating with others.${TAB}I cry.
EOF
$ cut -f 1 train.txt > train_X.txt
$ cut -f 2 train.txt > train_Y.txt
$ cat << EOF > test.txt
They had breakfast at the hotel.${TAB}They are full now.
They had breakfast at an Italian restaurant.${TAB}They are stuffed now.
I have dinner.${TAB}I have dinner again.
EOF
$ cut -f 1 test.txt > test_X.txt
$ cut -f 2 test.txt > test_Y.txt

然后,训练并预测:

$ phsic train_X.txt train_Y.txt --kernel1 Gaussian 1.0 --encoder1 SumBov FasttextEn --emb1 crawl-300d-2M.vec --kernel2 Gaussian 1.0 --encoder2 SumBov FasttextEn --emb2 crawl-300d-2M.vec --limit_words1 10000 --limit_words2 10000 --dim1 3 --dim2 3 --out_prefix toy --out_dir out --X_test test_X.txt --Y_test test_Y.txt
$ cat toy.Gaussian-1.0-SumBov-FasttextEn.Gaussian-1.0-SumBov-FasttextEn.3.3.phsic
1.134489336180434238e-01
2.320408776101631244e-03
2.321869174772554344e-03

引文

@InProceedings{D18-1203,
  author = 	"Yokoi, Sho
        and Kobayashi, Sosuke
        and Fukumizu, Kenji
        and Suzuki, Jun
        and Inui, Kentaro",
  title = 	"Pointwise HSIC: A Linear-Time Kernelized Co-occurrence Norm for Sparse Linguistic Expressions",
  booktitle = 	"Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing",
  year = 	"2018",
  publisher = 	"Association for Computational Linguistics",
  pages = 	"1763--1775",
  location = 	"Brussels, Belgium",
  url = 	"http://aclweb.org/anthology/D18-1203"
}

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
Grails2.1.0/Java7在Windows8.1上从何处获取当前用户名?   JavaSpringJMSActiveMQ   java异常处理,捕获导致while循环停止   sql server 2008将日期时间解析为JAVA日期   java是第一个servlet,但无法打开它   如何在Java中使用XML bean创建典型的XML头?   java将iOs应用程序转换为Android   java将jsp页面内容读取到其他jsp页面中的html   客户端计算机中小型数据库应用程序的java实现   java同步和ServletContextListener   安卓 Java将所有转义字符替换为双转义   当我在布局单元中实现ScrollView时,java GridView的setOnItemLongClickListener不起作用(使用适配器)   禁用Java web服务端点Wsdl   java如何编写一个程序来反转用户输入的数字