ValueError：至少需要一个数组才能连接到Top2Vec错误中

2021-01-19 05:17:08541-top2vec-信息-培训前处理文档信息：top2vec：培训前处理文档 2021-01-19 05:17:08562-top2vec-信息-下载通用句子编码器型号信息：top2vec：下载通用句子编码器型号 2021-01-19 05:17:13250-top2vec-信息-创建联合文档/文字嵌入信息：top2vec：创建联合文档/单词嵌入警告：tensorflow:最近6次呼叫中有5次呼叫<；功能重新创建功能..在0x7f8c4ce57d90处恢复功能体>；触发tf。函数回溯。跟踪代价高昂，跟踪次数过多可能是由于（1）在循环中重复创建@tf.function，（2）传递不同形状的张量，（3）传递Python对象而不是张量。对于（1），请在循环之外定义@tf.function。对于（2），@tf.function具有实验性的_relax_shapes=True选项，该选项可以松弛参数形状，从而避免不必要的回溯。有关第（3）项，请参阅https://www.tensorflow.org/guide/function#controlling_retracing和https://www.tensorflow.org/api_docs/python/tf/function了解更多详情。警告：tensorflow:最近6次呼叫中有5次呼叫<；功能重新创建功能..在0x7f8c4ce57d90处恢复功能体>；触发tf。函数回溯。跟踪代价高昂，跟踪次数过多可能是由于（1）在循环中重复创建@tf.function，（2）传递不同形状的张量，（3）传递Python对象而不是张量。对于（1），请在循环之外定义@tf.function。对于（2），@tf.function具有实验性的_relax_shapes=True选项，该选项可以松弛参数形状，从而避免不必要的回溯。有关第（3）项，请参阅https://www.tensorflow.org/guide/function#controlling_retracing和https://www.tensorflow.org/api_docs/python/tf/function了解更多详情。 2021-01-19 05:17:13548-top2vec-信息-创建文档的低维嵌入信息：top2vec：创建文档的低维嵌入 2021-01-19 05:17:15809-top2vec-信息-查找文档密集区域信息：top2vec：查找文档密集区域 2021-01-19 05:17:15823-top2vec-信息-查找主题信息：top2vec：查找主题

ValueError回溯（最近一次调用上次）在（） ----&燃气轮机；1模型=Top2Vec（文档，嵌入模型='通用句子编码器'）

2帧 <数组_函数内部>；在vstack（*args，**kwargs）中

vstack（tup）中的/usr/local/lib/python3.6/dist-packages/numpy/core/shape_base.py 281如果不存在（arrs，列表）： 282 arrs=[arrs] --&燃气轮机；283返回n.连接（arrs，0） 284 285

<数组_函数内部>；串联（*args，**kwargs）

ValueError:至少需要一个数组来连接

1条回答

网友

1楼 · 发布于 2024-04-25 20:26:57

你需要使用更多的文档和独特的词语来查找至少2个主题。举个例子，我只是将你的列表乘以10，它就可以工作了：

from top2vec import Top2Vec

docs = ['Consumer discretionary, healthcare and technology are preferred China equity  sectors.',
'Consumer discretionary remains attractive, supported by China’s policy to revitalize domestic consumption. Prospects of further monetary and fiscal stimulus  should reinforce the Chinese consumption theme.',
'The healthcare sector should be a key beneficiary of the coronavirus outbreak,  on the back of increased demand for healthcare services and drugs.',
'The technology sector should benefit from increased demand for cloud services  and hardware demand as China continues to recover from the coronavirus  outbreak.',
'China consumer discretionary sector is preferred. In our assessment, the sector  is likely to outperform the MSCI China Index in the coming 6-12 months.']

docs = docs*10 
model = Top2Vec(docs, embedding_model='universal-sentence-encoder')
print(model)

<top2vec.Top2Vec.Top2Vec object at 0x13eef6210>

我有几（30）份长达130000个字符的长文档，所以我只是将它们每5000个字符分割成更小的文档：


docs_split = []
for doc in docs:
    skip_n = 5000
    for i in range(0,130000,skip_n):
        docs_split.append(doc[i:i+skip_n])

相关问题更多 >

编程相关推荐

热门问题

热门文章