当所有文档都有特定单词时，使用分母为+1的idf

2条回答

网友

1楼 · 编辑于 2024-04-26 17:43:04

从sklearn TfidfTransformer

# perform idf smoothing if required
df += int(self.smooth_idf)
n_samples += int(self.smooth_idf)
# log+1 instead of log makes sure terms with zero idf don't get
# suppressed entirely.
idf = np.log(n_samples / df) + 1

如果smooth_idf是True，则df和n_samples增加1。你知道吗

所以，我们在除法器和分母中都加一个，或者我们不改变它们中的任何一个
因为我们也增加了除法器，所以永远不会得到负值。

在本例中，我们将+1添加到日志中。因为在这一步之后，idf被转换成稀疏矩阵，稀疏矩阵省略了零。因此，我们希望每个项都有一个除0之外的值。你知道吗

从文档中了解更多关于idf的信息

smooth_idf : boolean, default=True
Smooth idf weights by adding one to document frequencies, as if an
extra document was seen containing every term in the collection
exactly once. Prevents zero divisions.

网友

2楼 · 编辑于 2024-04-26 17:43:04

如果一个词存在于语料库中的每一个文档中，那么它的“知识”价值就很低——因为这个词不能区分一个文档和语料库。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章

当所有文档都有特定单词时，使用分母为+1的idf

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >