用于精确和可扩展的重复数据消除和实体解析的python库
dedupe的Python项目详细描述
重复数据消除是一种库,它使用机器学习快速对结构化数据执行重复数据消除和实体解析。重复数据消除是dedupe.io的开源引擎
dedupe will help you:
- remove duplicate entries from a spreadsheet of names and addresses
- link a list with customer information to another with order history, even without unique customer id’s
- take a database of campaign contributions and figure out which ones were made by the same person, even if the names were entered slightly differently for each record
dedupe takes in human training data and comes up with the best rules for your dataset to quickly and automatically find similar records, even with very large databases.