我为文本数据计算了tfidf矢量器,得到的矢量为(1000002000)max\u feature=2000。你知道吗
当我用下面的代码计算共现矩阵时。你知道吗
length = 2000
m = np.zeros([length,length]) # n is the count of all words
def cal_occ(sentence,m):
for i,word in enumerate(sentence):
print(i)
print(word)
for j in range(max(i-window,0),min(i+window,length)):
print(j)
print(sentence[j])
m[word,sentence[j]]+=1
for sentence in tf_vec:
cal_occ(sentence, m)
我得到以下错误。你知道吗
0
(0, 1210) 0.20426932204609685
(0, 191) 0.23516811545499153
(0, 592) 0.2537746177804585
(0, 1927) 0.2896119458034052
(0, 1200) 0.1624114163299802
(0, 1856) 0.24376566018277918
(0, 1325) 0.2789314085220367
(0, 756) 0.15365704375851477
(0, 1130) 0.293489555928974
(0, 346) 0.21231046306681553
(0, 557) 0.2036759579760878
(0, 1036) 0.29666992324872365
(0, 264) 0.36435609585838674
(0, 1701) 0.242619998334931
(0, 1939) 0.33934107208095693
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-96-ad505b6df734> in <module>()
11 m[word,sentence[j]]+=1
12 for sentence in tf_vec:
---> 13 cal_occ(sentence, m)
<ipython-input-96-ad505b6df734> in cal_occ(sentence, m)
9 print(j)
10 print(sentence[j])
---> 11 m[word,sentence[j]]+=1
12 for sentence in tf_vec:
13 cal_occ(sentence, m)
索引器:仅整数、片(:
)、省略号(...
),numpy.newaxis公司(None
)和整数或布尔数组是有效的索引
你最可能遇到的问题是:
min函数在i+window超出界限时返回长度,是否可以尝试此操作而不是上面的行:
希望这有帮助
干杯
相关问题 更多 >
编程相关推荐