前2000字tfidf矢量器的共现矩阵

2024-04-18 00:27:33 发布

您现在位置:Python中文网/ 问答频道 /正文

我为文本数据计算了tfidf矢量器,得到的矢量为(1000002000)max\u feature=2000。你知道吗

当我用下面的代码计算共现矩阵时。你知道吗

length = 2000
m = np.zeros([length,length]) # n is the count of all words
def cal_occ(sentence,m):
    for i,word in enumerate(sentence):
    print(i)
    print(word)
    for j in range(max(i-window,0),min(i+window,length)):
        print(j)
        print(sentence[j])
        m[word,sentence[j]]+=1
for sentence in tf_vec:
    cal_occ(sentence, m)

我得到以下错误。你知道吗

0
(0, 1210)   0.20426932204609685
(0, 191)    0.23516811545499153
(0, 592)    0.2537746177804585
(0, 1927)   0.2896119458034052
(0, 1200)   0.1624114163299802
(0, 1856)   0.24376566018277918
(0, 1325)   0.2789314085220367
(0, 756)    0.15365704375851477
(0, 1130)   0.293489555928974
(0, 346)    0.21231046306681553
(0, 557)    0.2036759579760878
(0, 1036)   0.29666992324872365
(0, 264)    0.36435609585838674
(0, 1701)   0.242619998334931
(0, 1939)   0.33934107208095693
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-96-ad505b6df734> in <module>()
 11             m[word,sentence[j]]+=1
 12 for sentence in tf_vec:
 ---> 13     cal_occ(sentence, m)

 <ipython-input-96-ad505b6df734> in cal_occ(sentence, m)
  9             print(j)
 10             print(sentence[j])
 ---> 11             m[word,sentence[j]]+=1
 12 for sentence in tf_vec:
 13     cal_occ(sentence, m)

索引器:仅整数、片(:)、省略号(...),numpy.newaxis公司(None)和整数或布尔数组是有效的索引


Tags: inforinputtf矢量ipythonwindowlength
1条回答
网友
1楼 · 发布于 2024-04-18 00:27:33

你最可能遇到的问题是:

for j in range(max(i-window,0),min(i+window,length)):

min函数在i+window超出界限时返回长度,是否可以尝试此操作而不是上面的行:

for j in range(max(i-window,0),min(i+window,length-1)):

希望这有帮助

干杯

相关问题 更多 >

    热门问题