当我在qgram和levenshtein方法中使用missingvalues参数时,它是有效的,但在lcs和jarowinkler中却没有。你知道为什么会这样吗
看到代码了吗
compare_cl_1.string('N_name','N_name', label='nombre levenshtein', method='levenshtein', missing_value=0.23) # 5667 valores unicos, 6733 numero total
compare_cl_1.string('N_name', 'N_name', method='jarowinkler', missing_value=0.56, label='nombre jarowinkler')
compare_cl_1.string('N_name', 'N_name', method='qgram', missing_value=0.13, label='nombre qgram')
compare_cl_1.string('N_name','N_name', method='lcs', missing_value=0.23, label='nombre lcs')
compare_cl_1.exact('N_address', 'N_address', label='direccion exacta') # 15680 valores unicos, 538745 numero total
compare_cl_1.string('N_address','N_address', missing_value=0.3, label='direccion levenshtein') # 14756 valores unicos, 476837 total parece que hay muchisimas repeticiones
compare_cl_1.string('N_address','N_address', method='jarowinkler', missing_value=0.61, label='direccion jarowinkler')
compare_cl_1.string('N_address', 'N_address', method='qgram',missing_value=0.2, label='direccion qgram')
compare_cl_1.string('N_address', 'N_address', method='lcs', missing_value=0.32, label='direccion lcs')
candidate_links = indexer.index(dfg, dfm)[:10000]
features = compare_cl.compute(candidate_links, dfg, dfm)
这就是我得到的
对于levenshtein和qgram列的输出,我得到的是设置值,但剩下的是0.0
目前没有回答
相关问题 更多 >
编程相关推荐