Python及一般情况下的浮点数相等性

19 投票

8 回答

23509 浏览

提问于 2025-04-16 00:01

我有一段代码，它的表现会根据我是否通过字典来获取转换因子而有所不同，或者直接使用这些因子。

下面这段代码会打印出 1.0 == 1.0 -> False。

但是如果你把 factors[units_from] 替换成 10.0，把 factors[units_to] 替换成 1.0 / 2.54，它就会打印出 1.0 == 1.0 -> True。

#!/usr/bin/env python

base = 'cm'
factors = {
    'cm'        : 1.0,
    'mm'        : 10.0,
    'm'         : 0.01,
    'km'        : 1.0e-5,
    'in'        : 1.0 / 2.54,
    'ft'        : 1.0 / 2.54 / 12.0,
    'yd'        : 1.0 / 2.54 / 12.0 / 3.0,
    'mile'      : 1.0 / 2.54 / 12.0 / 5280,
    'lightyear' : 1.0 / 2.54 / 12.0 / 5280 / 5.87849981e12,
}

# Convert 25.4 mm to inches
val = 25.4
units_from = 'mm'
units_to = 'in'

base_value = val / factors[units_from]
ret = base_value * factors[units_to  ]
print ret, '==', 1.0, '->', ret == 1.0

首先，我想说我对这里发生的事情有一定的了解。我以前在C语言中见过这种情况，只是从来没有在Python中遇到过，但因为Python是用C实现的，所以我们看到了这个问题。

我知道浮点数在从CPU寄存器到缓存再回来的过程中，值可能会发生变化。我也知道，如果两个应该相等的变量中有一个被换出到内存，而另一个仍然保留在寄存器中，比较它们时会返回false。

问题

避免这种问题的最佳方法是什么？...在Python中或者一般情况下。
我是不是做错了什么？

附注

这显然是一个简化的例子，但我想做的是创建一些长度、体积等的类，这些类可以与其他相同类但单位不同的对象进行比较。

反问

如果这是一个潜在的危险问题，因为它会导致程序表现得不可预测，编译器是否应该在检测到你在检查浮点数相等时发出警告或错误？
编译器是否应该支持一个选项，将所有浮点数相等检查替换为一个“足够接近”的函数？
编译器是否已经在做这些，只是我找不到相关信息？

内存管理计算机科学数据类型程序设计数值稳定性浮点数比较单位转换编译器优化

8 个回答

区别在于，如果你把 factors[units_to ] 替换成 1.0 / 2.54，你实际上是在做：

(base_value * 1.0) / 2.54

而使用字典时，你是在做：

base_value * (1.0 / 2.54)

四舍五入的顺序是很重要的。如果你这样做的话，会更容易看出：

>>> print (((25.4 / 10.0) * 1.0) / 2.54).__repr__()
1.0
>>> print ((25.4 / 10.0) * (1.0 / 2.54)).__repr__()
0.99999999999999989

注意，这里没有不确定或未定义的行为。实际上有一个标准，叫做 IEEE-754，所有实现都必须遵循这个标准（虽然并不意味着它们总是会做到）。

我认为不应该有一个自动的“足够接近”的替换。这通常是处理问题的有效方法，但是否使用以及如何使用，应该由程序员自己决定。

最后，当然还有一些选项可以进行任意精度的计算，包括 python-gmp 和 decimal。要考虑你是否真的需要这些，因为它们会对性能产生显著影响。

在常规寄存器和缓存之间移动是没有问题的。你可能是在想 x86 的 80 位扩展精度。

回答于 2025-04-16 由 Python大师

分享举报

比较两个浮点数（比如float或double）其实是个麻烦事。通常，我们不应该直接比较它们是否完全相等，而是要检查它们是否在一个允许的误差范围内。如果它们在这个误差范围内，就可以认为它们是相等的。

说起来容易，但做起来难。浮点数的特性让固定的误差范围变得没用。当值接近0.0时，一个小的误差范围（比如2*float_epsilon）效果不错，但如果值接近1000，就会失效。对于像1,000,000.0这么大的值，误差范围对于接近0.0的值来说又太宽松了。

最好的办法是了解你的数学领域，根据具体情况选择合适的误差范围。

如果这样做不太实际，或者你懒得去做，最后一位单位（ULPs）是一个非常新颖且可靠的解决方案。具体的细节比较复杂，你可以在这里了解更多。

基本的想法是，浮点数由两个部分组成：尾数和指数。通常，舍入误差只会让尾数变化几个步。当值接近0.0时，这些步正好是float_epsilon。当浮点值接近1,000,000时，这些步长几乎会变得和1一样大。

Google测试使用ULP来比较浮点数。他们选择了默认的4个ULP作为两个浮点数被认为相等的标准。你也可以参考他们的代码，自己构建一个ULP风格的浮点数比较器。

回答于 2025-04-16 由 Python大师

分享举报

感谢大家的回复。大部分都很不错，还提供了很好的链接，所以我就简单说一下，然后回答我自己的问题。

Caspin发了这个链接。

他还提到Google Tests使用了ULP比较，当我查看Google的代码时，发现他们也提到了同一个Cygnus软件的链接。

我最终在C语言中实现了一些算法，作为Python的扩展，后来发现其实也可以用纯Python来做。代码在下面。

最后，我可能会把ULP差异加入我的小工具箱里。

看到在应该是两个相等的数字之间，有多少个浮点数，真是有趣。这两个数字其实从未离开过内存。有一篇文章或者Google的代码提到4是一个不错的数字……但我这里能达到10。

>>> f1 = 25.4
>>> f2 = f1
>>>
>>> for i in xrange(1, 11):
...     f2 /= 10.0          # To cm
...     f2 *= (1.0 / 2.54)  # To in
...     f2 *= 25.4          # Back to mm
...     print 'after %2d loops there are %2d doubles between them' % (i, dulpdiff(f1, f2))
...
after  1 loops there are  1 doubles between them
after  2 loops there are  2 doubles between them
after  3 loops there are  3 doubles between them
after  4 loops there are  4 doubles between them
after  5 loops there are  6 doubles between them
after  6 loops there are  7 doubles between them
after  7 loops there are  8 doubles between them
after  8 loops there are 10 doubles between them
after  9 loops there are 10 doubles between them
after 10 loops there are 10 doubles between them

还有一个有趣的点是，当其中一个数字以字符串的形式写出并再读回来时，它们之间有多少个浮点数。

>>> # 0 degrees Fahrenheit is -32 / 1.8 degrees Celsius
... f = -32 / 1.8
>>> s = str(f)
>>> s
'-17.7777777778'
>>> # Floats between them...
... fulpdiff(f, float(s))
0
>>> # Doubles between them...
... dulpdiff(f, float(s))
6255L

import struct
from functools import partial

# (c) 2010 Eric L. Frederich
#
# Python implementation of algorithms detailed here...
# From http://www.cygnus-software.com/papers/comparingfloats/comparingfloats.htm

def c_mem_cast(x, f=None, t=None):
    '''
    Do a c-style memory cast

    In Python...

    x = 12.34
    y = c_mem_cast(x, 'd', 'l')

    ... should be equivalent to the following in c...

    double x = 12.34;
    long   y = *(long*)&x;
    '''
    return struct.unpack(t, struct.pack(f, x))[0]

dbl_to_lng = partial(c_mem_cast, f='d', t='l')
lng_to_dbl = partial(c_mem_cast, f='l', t='d')
flt_to_int = partial(c_mem_cast, f='f', t='i')
int_to_flt = partial(c_mem_cast, f='i', t='f')

def ulp_diff_maker(converter, negative_zero):
    '''
    Getting the ULP difference of floats and doubles is similar.
    Only difference if the offset and converter.
    '''
    def the_diff(a, b):

        # Make a integer lexicographically ordered as a twos-complement int
        ai = converter(a)
        if ai < 0:
            ai = negative_zero - ai

        # Make b integer lexicographically ordered as a twos-complement int
        bi = converter(b)
        if bi < 0:
            bi = negative_zero - bi

        return abs(ai - bi)

    return the_diff

# Double ULP difference
dulpdiff = ulp_diff_maker(dbl_to_lng, 0x8000000000000000)
# Float ULP difference
fulpdiff = ulp_diff_maker(flt_to_int, 0x80000000        )

# Default to double ULP difference
ulpdiff = dulpdiff
ulpdiff.__doc__ = '''
Get the number of doubles between two doubles.
'''

回答于 2025-04-16 由 Python大师

分享举报

Python及一般情况下的浮点数相等性

8 个回答

撰写回答