用于精确和可扩展的重复数据消除和实体解析的python库

dedupe的Python项目详细描述


重复数据消除是一种库,它使用机器学习快速对结构化数据执行重复数据消除和实体解析。重复数据消除是dedupe.io的开源引擎

dedupe will help you:

  • remove duplicate entries from a spreadsheet of names and addresses
  • link a list with customer information to another with order history, even without unique customer id’s
  • take a database of campaign contributions and figure out which ones were made by the same person, even if the names were entered slightly differently for each record

dedupe takes in human training data and comes up with the best rules for your dataset to quickly and automatically find similar records, even with very large databases.

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
为什么按钮点击播放声音的方法不起作用   java如何在创建小于窗口的PGraphics对象时避免“抗锯齿效应”,然后将其放大到窗口大小?   作用域中的java Antlr check返回语句   java是否在swing中显示所有鼠标悬停事件?   编辑:JAVA(Swing):JAVA(Swing)中是否有任何功能可以像javafx中的webview一样保存Html页面   java GPS文本输入   如何使用Java SE生成多个jasper报告   swing Java ActionListener未拾取按钮   性能为什么java内置序列化比Gson差?   java JAXR使用相同路径创建多个类   java Spring MVC 3.1请求头字符编码问题[UTF8]   java从Hibernate处理的h2数据库的表中删除会损坏该表吗?   Java Mybatis多个更新语句   找不到java JUnitCore类   java扩展SpringWebFlow 2.3