Python优化欧几里得距离计算

2 投票
3 回答
610 浏览
提问于 2025-04-17 16:35

这个函数在我的程序中运行超过200万次。有没有人能建议我怎么优化一下?x和y是元组。

def distance(x,y):

    return sqrt((x[0]-y[0])*(x[0]-y[0])+(x[1]-y[1])*(x[1]-y[1])+(x[2]-y[2])*(x[2]-y[2]))

我尝试过:我试着用math.sqrt、pow和x**0.5,但性能提升不大。

3 个回答

1

原文:

>>> timeit.timeit('distance2((0,1,2),(3,4,5))', '''
... from math import sqrt
... def distance2(x,y):
...     return sqrt((x[0]-y[0])*(x[0]-y[0])+(x[1]-y[1])*(x[1]-y[1])+(x[2]-y[2])*(x[2]-y[2]))
... ''')
1.1989610195159912

常见子表达式消除:

>>> timeit.timeit('distance((0,1,2),(3,4,5))', '''
... def distance(x,y):
...     d1 = x[0] - y[0]
...     d2 = x[1] - y[1]
...     d3 = x[2] - y[2]
...     return (d1 * d1 + d2 * d2 + d3 * d3) ** .5''')
0.93855404853820801

优化解包:

>>> timeit.timeit('distance((0,1,2),(3,4,5))', '''
... def distance(x,y):
...     x1, x2, x3 = x
...     y1, y2, y3 = y
...     d1 = x1 - y1
...     d2 = x2 - y2
...     d3 = x3 - y3
...     return (d1 * d1 + d2 * d2 + d3 * d3) ** .5''')
0.90851116180419922

库函数:

>>> timeit.timeit('distance((0,1,2),(3,4,5))', '''
... import math
... def distance(x,y):
...     x1, x2, x3 = x
...     y1, y2, y3 = y
...     d1 = x1 - y1
...     d2 = x2 - y2
...     d3 = x3 - y3
...     return math.sqrt(d1 * d1 + d2 * d2 + d3 * d3)
... ''')
0.78318595886230469

点表示法:

>>> timeit.timeit('distance((0,1,2),(3,4,5))', '''
... from math import sqrt
... def distance(x,y):
...     x1, x2, x3 = x
...     y1, y2, y3 = y
...     d1 = x1 - y1
...     d2 = x2 - y2
...     d3 = x3 - y3
...     return sqrt(d1 * d1 + d2 * d2 + d3 * d3)
... ''')
0.75629591941833496
2

scipy有一个计算欧几里得距离的函数。你可能找不到比这个更快的了。

http://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.euclidean.html#scipy.spatial.distance.euclidean

from scipy.spatial.distance import euclidean
import numpy as np

# x and y are 1 x 3 vectors
x = np.random.rand(1,3) 
y = np.random.rand(1,3)
euclidean(x,y)

编辑:实际上,把这个和提问者的纯Python的distance()函数用timeit测试了一下,发现scipy的这个在处理Python的浮点数时反而慢得多。我觉得scipy的版本在把浮点数转换成numpy的数据类型时浪费了一些时间。说实话,我对此感到很惊讶。

3

你可以通过不重复访问同一个 x[i] 元素,而是把它绑定到一个本地变量上,从而节省一些计算时间。

def distance(x,y):
    x0, x1, x2 = x
    y0, y1, y2 = y
    return sqrt((x0-y0)*(x0-y0)+(x1-y1)*(x1-y1)+(x2-y2)*(x2-y2))

比如,比较一下:

>>> import timeit
>>> timer = timeit.Timer(setup='''
... from math import sqrt
... def distance(x,y):
...    return sqrt((x[0]-y[0])*(x[0]-y[0])+(x[1]-y[1])*(x[1]-y[1])+(x[2]-y[2])*(x[2]-y[2]))
... ''', stmt='distance((0,0,0), (1,2,3))')
>>> timer.timeit(1000000)
1.2761809825897217

和:

>>> import timeit
>>> timer = timeit.Timer(setup='''
... from math import sqrt
... def distance(x,y):
...    x0, x1, x2 = x
...    y0, y1, y2 = y
...    return sqrt((x0-y0)*(x0-y0)+(x1-y1)*(x1-y1)+(x2-y2)*(x2-y2))
... ''', stmt='distance((0,0,0), (1,2,3))')
>>> timer.timeit(1000000)
0.8375771045684814

在 Python 的维基上还有更多的性能提升技巧

撰写回答