增加重复数据消除库中的max\u components变量

2024-04-26 06:16:18 发布

您现在位置:Python中文网/ 问答频道 /正文

如何增加max_components变量的默认值?你知道吗

默认情况下max_components设置为30000。我需要增加这个限制,因为每次执行重复数据消除(使用相同的数据集)时,结果都不同。你知道吗

我认为我的数据中的集群总数大于30000个。你知道吗


Tags: 数据情况components集群max总数
1条回答
网友
1楼 · 发布于 2024-04-26 06:16:18

Github的回答

Issue in dedupe github Increase max_components = 30000

If you are getting different results using same saved settings file, then what you reporting is a bug. If you are getting different results from different training data (or even the same training data), that's expected as at various points dedupe uses a random sample to learn good rules.

In either case, I doubt that max_components is related. But, if you want to change it, fork the code and change it.

相关问题 更多 >