我试图评估Spacy中最相似的方法(https://spacy.io/api/vectors#most_similar)的性能。我很好奇它在GPU上是否工作得更快。功能如下:
def spacy_most_similar(word, topn=10):
ms = nlp_ru.vocab.vectors.most_similar(nlp_ru(word).vector.reshape(1,100), n=topn)
words = [nlp_ru.vocab.strings[w] for w in ms[0][0]]
distances = ms[2]
return words, distances
spacy_most_similar("дерево", 10)
适用于CPU版本,但在GPU(使用CuPy阵列而不是NumPy)上,我收到一个错误:
TypeError Traceback (most recent call last)
<ipython-input-8-ea5e049ec55b> in <module>()
7 distances = ms[2]
8 return words, distances
----> 9 spacy_most_similar("дерево", 10)
<ipython-input-8-ea5e049ec55b> in spacy_most_similar(word, topn)
3 print(nlp_ru(word).vector.reshape(1,100).shape)
4 ms = nlp_ru.vocab.vectors.most_similar(
----> 5 nlp_ru(word).vector.reshape(1,100), n=topn)
6 words = [nlp_ru.vocab.strings[w] for w in ms[0][0]]
7 distances = ms[2]
vectors.pyx in spacy.vectors.Vectors.most_similar()
TypeError: list indices must be integers or slices, not cupy.core.core.ndarray
我也尝试过这种方法:
def spacy_most_similar(word, topn=10):
ms = nlp_ru.vocab.vectors.most_similar(np.asarray([nlp_ru.vocab.vectors[nlp_ru.vocab.strings[word]]]), n=topn)
words = [nlp_ru.vocab.strings[w] for w in ms[0][0]]
distances = ms[2]
return words, distances
spacy_most_similar("дерево", 10)
同样,在CPU上工作正常,但对于GPU版本(我将np更改为cp):
import cupy as cp
def spacy_most_similar(word, topn=10):
with cp.cuda.Device(0):
nlp_ru.vocab.vectors.data = cp.asarray(nlp_ru.vocab.vectors.data)
ms = nlp_ru.vocab.vectors.most_similar(cp.asarray([nlp_ru.vocab.vectors[nlp_ru.vocab.strings[word]]]), n=topn)
words = [nlp_ru.vocab.strings[w] for w in ms[0][0]]
distances = ms[2]
return words, distances
spacy_most_similar("дерево", 10)
我犯了这样一个错误:
TypeError Traceback (most recent call last)
<ipython-input-6-876656d5f75d> in <module>()
7 distances = ms[2]
8 return words, distances
----> 9 spacy_most_similar("дерево", 10)
<ipython-input-6-876656d5f75d> in spacy_most_similar(word, topn)
3 with cp.cuda.Device(0):
4 nlp_ru.vocab.vectors.data = cp.asarray(nlp_ru.vocab.vectors.data)
----> 5 ms = nlp_ru.vocab.vectors.most_similar(cp.asarray([nlp_ru.vocab.vectors[nlp_ru.vocab.strings[word]]]), n=topn)
6 words = [nlp_ru.vocab.strings[w] for w in ms[0][0]]
7 distances = ms[2]
vectors.pyx in spacy.vectors.Vectors.most_similar()
TypeError: unhashable type: 'cupy.core.core.ndarray'
你能帮我为最相似的()方法建立正确的CuPy输入吗
考虑到现有的source code,我怀疑您能否在GPU上执行
most_similar
:注意,
filled
已经是一个CPU对象,它将通过从numpy数组(而不是从cupy数组)获取的索引进行正确索引。错误TypeError: list indices must be integers or slices, not cupy.core.core.ndarray
来自以下两行:如果你认为在GPU上找到最相似的单词是有价值的,你可以在https://github.com/explosion/spaCy/issues上发表一篇文章,或者写你自己的
most_similar
(我认为这很简单)相关问题 更多 >
编程相关推荐