前2000字tfidf矢量器的共现矩阵

2024-04-18 00:27:33 发布

您现在位置：Python中文网/ 问答频道 /正文

9336

网友

男 | 程序猿一只，喜欢编程写python代码。

我为文本数据计算了tfidf矢量器，得到的矢量为（1000002000）max\u feature=2000。你知道吗

当我用下面的代码计算共现矩阵时。你知道吗

length = 2000
m = np.zeros([length,length]) # n is the count of all words
def cal_occ(sentence,m):
    for i,word in enumerate(sentence):
    print(i)
    print(word)
    for j in range(max(i-window,0),min(i+window,length)):
        print(j)
        print(sentence[j])
        m[word,sentence[j]]+=1
for sentence in tf_vec:
    cal_occ(sentence, m)

我得到以下错误。你知道吗

0
(0, 1210)   0.20426932204609685
(0, 191)    0.23516811545499153
(0, 592)    0.2537746177804585
(0, 1927)   0.2896119458034052
(0, 1200)   0.1624114163299802
(0, 1856)   0.24376566018277918
(0, 1325)   0.2789314085220367
(0, 756)    0.15365704375851477
(0, 1130)   0.293489555928974
(0, 346)    0.21231046306681553
(0, 557)    0.2036759579760878
(0, 1036)   0.29666992324872365
(0, 264)    0.36435609585838674
(0, 1701)   0.242619998334931
(0, 1939)   0.33934107208095693
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-96-ad505b6df734> in <module>()
 11             m[word,sentence[j]]+=1
 12 for sentence in tf_vec:
 ---> 13     cal_occ(sentence, m)

 <ipython-input-96-ad505b6df734> in cal_occ(sentence, m)
  9             print(j)
 10             print(sentence[j])
 ---> 11             m[word,sentence[j]]+=1
 12 for sentence in tf_vec:
 13     cal_occ(sentence, m)

索引器：仅整数、片（:）、省略号（...），numpy.newaxis公司（None）和整数或布尔数组是有效的索引

Tags： in for input tf 矢量 ipython window length

1条回答

网友

1楼 · 发布于 2024-04-18 00:27:33

你最可能遇到的问题是：

for j in range(max(i-window,0),min(i+window,length)):

min函数在i+window超出界限时返回长度，是否可以尝试此操作而不是上面的行：

for j in range(max(i-window,0),min(i+window,length-1)):

希望这有帮助

干杯

前2000字tfidf矢量器的共现矩阵

相关问题更多 >

编程相关推荐

热门问题

热门文章

前2000字tfidf矢量器的共现矩阵

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >