模糊字符串比较算法的并行化

2024-06-10 23:34:47 发布

您现在位置:Python中文网/ 问答频道 /正文

我无法使用Numba库使用Winnowing方法并行化模糊字符串比较算法。如何实现这一点?我在试图解决问题时遇到了许多错误。其中,data是字符串数组,dataHash是字符串哈希数组

@njit
def winnowing( data,                   # an array of strings
               dataHash,               # an array of hashes of strings
               threshold = 0.85        # a threshold for jaccard()
               ):
    for          i in range( len( dataHash ) ):
        for      j in range( len( dataHash ) ):
            if ( i == j or dataHash[j] == None ):
                 continue
            if ( threshold <= jaccard( str( dataHash[i] ),
                                       str( dataHash[j] )
                                       )
                 ):
                 dataHash[j] = None
                 data[    j] = None

    return( data )

错误的形式通常为:

numba.core.errors.TypingError: Failed in nopython mode pipeline
                                  (step: nopython frontend)
                                   non-precise type array(pyobject, 2d, C)

Tags: of字符串innoneanfordatathreshold