应用于整个数据的函数

def _c(ca, i, j, p, q): if ca[i, j] > -1: return ca[i, j] elif i == 0 and j == 0: ca[i, j] = np.linalg.norm(p[i]-q[j]) elif i > 0 and j == 0: ca[i, j] = max(_c(ca, i-1, 0, p, q), np.linalg.norm(p[i]-q[j])) elif i == 0 and j > 0: ca[i, j] = max(_c(ca, 0, j-1, p, q), np.linalg.norm(p[i]-q[j])) elif i > 0 and j > 0: ca[i, j] = max( min( _c(ca, i-1, j, p, q), _c(ca, i-1, j-1, p, q), _c(ca, i, j-1, p, q) ), np.linalg.norm(p[i]-q[j]) ) else: ca[i, j] = float('inf') return ca[i, j]

def frdist(p, q): # Remove nan values from p p = np.array([i for i in p if np.any(np.isfinite(i))], np.float64) # ESSENTIAL PART TO REMOVE NaN q = np.array([i for i in q if np.any(np.isfinite(i))], np.float64) # ESSENTIAL PART TO REMOVE NaN len_p = len(p) len_q = len(q) if len_p == 0 or len_q == 0: raise ValueError('Input curves are empty.') # p and q no longer have to be the same length if len(p[0]) != len(q[0]): raise ValueError('Input curves do not have the same dimensions.') ca = (np.ones((len_p, len_q), dtype=np.float64) * -1) dist = _c(ca, len_p-1, len_q-1, p, q) return(dist)

1 1.1 2 2.1 3 3.1 4 4.1 5 5.1 0 43.1024 6.7498 NaN NaN NaN NaN NaN NaN NaN NaN 1 46.0595 1.6829 25.0695 3.7463 NaN NaN NaN NaN NaN NaN 2 25.0695 5.5454 44.9727 8.6660 41.9726 2.6666 84.9566 3.8484 44.9566 1.8484 3 35.0281 7.7525 45.0322 3.7465 14.0369 3.7463 NaN NaN NaN NaN 4 35.0292 7.5616 45.0292 4.5616 23.0292 3.5616 45.0292 6.7463 NaN

1条回答

网友

1楼 · 发布于 2024-04-26 21:54:03

由于您的工作代码需要列表列表作为参数，因此需要将数据帧的每一行转换为列表列表列表，如示例中的p和q。假设df是您的数据帧，您可以通过以下方式执行此操作：

def pairwise(it):
    a = iter(it)
    return zip(a, a)

ddf = df.apply(lambda x : [pair for pair in pairwise(x)], axis=1)

我从this answer取了pairwise函数。你知道吗

ddf是一个具有一列的数据帧，每个元素是一个类似于p或q的列表。你知道吗

然后需要处理行索引的组合。看看itertools模块。根据您的需要，您可以使用product、permutations或combinations中的一种。你知道吗

如果要进行每个组合，可以使用：

from itertools import product
idxpairs = product(ddf.index, repeat=2)

idxpairs保存数据帧中所有可能的索引对。你可以绕过去。你知道吗

您可以这样构建最终矩阵：

fmatrix = pd.DataFrame(index=ddf.index, columns=ddf.index)

for pp in idxpairs:
    fmatrix.loc[pp[0], pp[1]] = frdist(ddf.iloc[pp[0]], ddf.iloc[pp[1]])

现在将计算每个元素的暴力。如果您有一个大的数据帧，并且您事先知道最终矩阵将具有给定的属性，例如对角线为0并且它是对称的（我猜frdist(p, q) == frdist(q, p)），您可以通过使用combinations而不是product来节省一些时间，以避免两次执行相同的计算：

from itertools import combinations
idxpairs = combinations(ddf.index, 2)

fmatrix = pd.DataFrame(index=ddf.index, columns=ddf.index)

for pp in idxpairs:
    res = frdist(ddf.iloc[pp[0]], ddf.iloc[pp[1]])
    fmatrix.loc[pp[0], pp[1]] = res
    fmatrix.loc[pp[1], pp[0]] = res

相关问题更多 >

编程相关推荐

热门问题

热门文章