python中的文本转换

import collections,re from pybrain.datasets import SupervisedDataSet #create the supervised dataset variable with 5 inputs and 1 output windowSize=5 main_ds = SupervisedDataSet(windowSize,1) with open('ltest5lg_d1.fr','r') as train_1: import_data_train=train_1.readlines() train_data = [] for lines in import_data_train: s = lines.split() for words in s: train_data.append(words) bagsofwords = [collections.Counter(re.findall(r'\w+', txt)) for txt in train_data] sumbags = sum(bagsofwords, collections.Counter())

1条回答

网友

1楼 · 发布于 2024-04-25 15:20:03

词汇嵌入模型是在学习语境中表现词汇的标准方法。在

您想要的（这只是粗略地浏览一下PyBrain的数据集页面[1]）是通过将文本转换成向量表示来构建数据集。在

有关如何自己执行的示例，请参见glove python[2]。如果您想使用现有的包来实现这一点，请参阅Google的word2vec[3]或Stanford'sglove[4]，其中python版本是一个幼稚的实现。在

然后你可以用这个表示来训练你的神经网络。在

[1] http://pybrain.org/docs/quickstart/dataset.html
[2] https://github.com/maciejkula/glove-python
[3] https://code.google.com/p/word2vec/
[4] http://www-nlp.stanford.edu/projects/glove/

相关问题更多 >

编程相关推荐

热门问题

热门文章