有没有办法用自定义数据集验证通过BlazingText生成的单词嵌入?

2024-03-28 21:19:31 发布

您现在位置:Python中文网/ 问答频道 /正文

我使用BlazingText来生成单词嵌入默认情况下,它使用WordSim353数据集来验证嵌入和输出spearman\u rho度量。你知道吗

我想使用定制的数据集进行验证,以执行内在评估,如单词类比和单词相似性测试。有没有办法将自定义验证集传递给BlazingText算法?你知道吗

I know that Gensim offers such functionality to perform Word Analogy and Word Similarity tests using custom validation datasets.

我想确认一下,如果这种功能在BlazingText中可用。你知道吗

炽热的文本文档说它只支持火车频道。(https://docs.aws.amazon.com/sagemaker/latest/dg/blazingtext.html) (https://docs.aws.amazon.com/sagemaker/latest/dg/blazingtext_hyperparameters.html

As per documentation

Training and Validation Data Format

Training and Validation Data Format for the Word2Vec Algorithm

For Word2Vec training, upload the file under the train channel. No other channels are supported. The file should contain a training sentence per line.

If the parameter evaluation set to True by default it uses WordSimilarity-353 Dataset to validate embeddings.

enter image description here


Tags: andtheto数据httpscomawsdocs