fasttext无法加载培训txt fi

2条回答

网友

1楼 · 编辑于 2024-04-23 20:04:40

我花了一点时间创建一个环境来测试您的代码。但我在Windows中所做的和为我工作的是在Cygwin中安装fastText。我希望这个答案能对有类似问题的人有所帮助。在

Environment

Winwdows 10
CYGWIN_NT-10.0桌面-RR909JI 2.10.0（0.325/5/3）2018-02-02 15:16 x86_64
gcc-g++:7.3 | gcc核心7.3
Python 2.7 | Python2 Cython 0.25.2 | python2pip | Python2 devel
pip安装fastText

Files

user@DESKTOP-RR909JI ~/projects
$ file *
data.txt:         ASCII text
data.train.txt:   Big-endian UTF-16 Unicode text
fasttext_ie.py:   Python script, ASCII text executable
model.bin:        data
wiki.simple.vec:  UTF-8 Unicode text, with very long lines

fastest_ie.py

^{2}$

我已经下载了预先训练过的单词向量(wiki.simple.vec网站)from here。我已经在data.txt中复制了您的输入示例，并用UTF-16data.train.txt制作了一个版本

执行代码段后，花了一段时间，但生成了一个文件，但它只发生在ASCII文本文件中：

user@DESKTOP-RR909JI ~/projects
$ ls -ltrh model.bin
-rw-r r  1 user user 129M jun. 28 00:56 model.bin

它有很多字符串：

qateel
olympiques
lesothosaurus
delillo
satrapi
conferencing
numan
echinodermata
haast
tangerines
duat
vesey
rotaviruses
velox
chepstow
capitale
rock/pop
belasco
sardanapalus
jadis
macintyre

When trying with UTF-16

它没有生成文件，但也没有完成过程，它只是继续运行而没有完成。在

所以我们可以说，它失败了。在

尽管fastText说UTF-8 it's supported：

where data.txt is a training file containing UTF-8 encoded text. By default the word vectors will take into account character n-grams from 3 to 6 characters. At the end of optimization the program will save two files: model.bin and model.vec. model.vec is a text file containing the word vectors, one per line. model.bin is a binary file containing the parameters of the model along with the dictionary and all hyper parameters. The binary file can be used later to compute word vectors or to restart the optimization.

我通过Cygwin安装的版本可能会有所不同。在

在stackoverflow中阅读了this question之后，我想问：您是否尝试过将文件更改为ASCII并测试发生了什么？在

我所有的文件都在同一根目录下。在

我不知道fastText，但我想执行你的代码，这很有用。我对gcc库有问题，我不得不为g++和core安装相同的版本。在

网友

2楼 · 编辑于 2024-04-23 20:04:40

TL；DR:使用os module安全地构造路径，特别是在python2中

错误表明无法加载文件。由于环境之间的唯一区别是操作系统，所以线索是您没有正确定位文件，因为每个操作系统处理路径的方式不同。我觉得这是大多数python程序员至少犯过一次的错误，因为这是出乎意料的。在

你可以对路径进行硬编码，但是如果你使用跨平台的东西，你会遇到问题。在我的例子中，有时我在Windows中快速开发一些东西，但是在一个*nix平台上进行大规模部署。在

我建议你习惯使用操作系统模块，因为它可以跨平台工作。在一篇评论中说他们的路径是“myfolder\nfolder\tfolder”；通过尝试为路径构造自己的字符串而不是使用os模块。。在windows上，即使文件夹不是以换行符和制表符\t开头，它仍然无法工作，因为windows路径需要转义斜杠（\）。使用操作系统，你不必知道。在

>>> import os
>>> os.getcwd()
'C:\\Python27'
>>> os.path.abspath(os.sep)
'C:\\'
>>> os.chdir(os.path.join(os.path.abspath(os.sep, "Users", "Jeff"))
>>> os.getcwd()
'C:\\Users\\Jeff'

通常，您将使用来自项目根目录的相对路径，而不是绝对路径。这些比较容易，当前操作系统的根本原因是有点棘手（你可以找到答案here）

（我提供了我们从评论中得出的完整答案）

编辑：也许python3有一个this link比os更好的东西，pathlib。我从来没有用过python3，所以我不能说。在

相关问题更多 >

编程相关推荐

热门问题

热门文章