使用SQLAlchemy和MySQL进行浮点数取整

1 投票
1 回答
8732 浏览
提问于 2025-04-18 02:23

我的问题和这个没有答案的问题有点相似:SQLAlchemy 提交时浮点数被四舍五入

我有一个数据的文本文件,内容是这样的:

#file camera date mjd focus error
ibcy02blq UVIS1 08/03/09   55046.196630   0.57857   0.55440
ibcy02bnq UVIS1 08/03/09   55046.198330  -0.15000   0.42111
ibcy03j8q UVIS1 08/11/09   55054.041650  -0.37143   0.40802
ibcy03jaq UVIS1 08/11/09   55054.043350  -0.91857   0.51859
ibcy04m4q UVIS1 08/18/09   55061.154900  -0.32333   0.52327
ibcy04m6q UVIS1 08/18/09   55061.156600  -0.24867   0.66651
ibcy05b7q UVIS1 09/05/09   55079.912670   0.64900   0.58423
ibcy05b9q UVIS1 09/05/09   55079.914370   0.82000   0.50202
ibcy06meq UVIS1 10/02/09   55106.909840  -0.09667   0.24016

但是当我把它读入到我的 MySQL 数据库时,它变成了这样:

+------+-----------+--------+------------+---------+----------+
| id   | filename  | camera | date       | mjd     | focus    |
+------+-----------+--------+------------+---------+----------+
| 1026 | ibcy02blq | UVIS1  | 2009-08-03 | 55046.2 |  0.57857 |
| 1027 | ibcy02bnq | UVIS1  | 2009-08-03 | 55046.2 |    -0.15 |
| 1028 | ibcy03j8q | UVIS1  | 2009-08-11 |   55054 | -0.37143 |
| 1029 | ibcy03jaq | UVIS1  | 2009-08-11 |   55054 | -0.91857 |
| 1030 | ibcy04m4q | UVIS1  | 2009-08-18 | 55061.2 | -0.32333 |
| 1031 | ibcy04m6q | UVIS1  | 2009-08-18 | 55061.2 | -0.24867 |
| 1032 | ibcy05b7q | UVIS1  | 2009-09-05 | 55079.9 |    0.649 |
| 1033 | ibcy05b9q | UVIS1  | 2009-09-05 | 55079.9 |     0.82 |
| 1034 | ibcy06meq | UVIS1  | 2009-10-02 | 55106.9 | -0.09667 |
| 1035 | ibcy06mgq | UVIS1  | 2009-10-02 | 55106.9 |  -0.1425 |
+------+-----------+--------+------------+---------+----------+

这里的 mjd 列被截断了,我不太明白为什么。我知道像 1/3 这样的浮点数会有精度误差,但这看起来更像是某种四舍五入的操作。

这是我用来将数据导入数据库的代码:

def make_focus_table_main():
    """The main controller for the make_focus_table 
    module."""
    logging.info('Process Starting')
    filename_list = glob.glob('/grp/hst/OTA/focus/source/FocusModel/UVIS*FocusHistory.txt')
    logging.info('Found {} files'.format(len(filename_list)))
    for filename in filename_list:
        logging.info('Reading data from {}'.format(filename))
        output_list = []
        with open(filename, 'r') as f:
            data = f.readlines()
            for line in data[1:]:
                line = line.split()
                output_dict = {}
                output_dict['filename'] = line[0]
                output_dict['camera'] = line[1]
                output_dict['date'] = datetime.strptime(line[2], '%m/%d/%y')
                output_dict['mjd'] = float(line[3])
                output_dict['focus'] = float(line[4])
                output_list.append(output_dict)
        logging.info('Beginning bulk insert of records.')
        engine.execute(Focus.__table__.insert(), output_list)
        logging.info('Database insert complete.')
    logging.info('Process Complete')

我使用了 pdb 来检查在传递给数据库之前,值并没有被截断(也就是说,Python/SQLAlchemy 并没有进行四舍五入)。我可以在 SQLAlchemy 发出的 INSERT 命令中验证这一点:

2014-04-11 13:08:20,522 INFO sqlalchemy.engine.base.Engine INSERT INTO focus (filename, camera, date, mjd, focus) VALUES (%s, %s, %s, %s, %s)
2014-04-11 13:08:20,602 INFO sqlalchemy.engine.base.Engine (
    ('ibcy02blq', 'UVIS2', datetime.datetime(2009, 8, 3, 0, 0), 55046.19663, 1.05778), 
    ('ibcy02bnq', 'UVIS2', datetime.datetime(2009, 8, 3, 0, 0), 55046.19833, 1.32333), 
    ('ibcy03j8q', 'UVIS2', datetime.datetime(2009, 8, 11, 0, 0), 55054.04165, 1.57333), 
    ('ibcy03jaq', 'UVIS2', datetime.datetime(2009, 8, 11, 0, 0), 55054.04335, 0.54333), 
    ('ibcy04m4q', 'UVIS2', datetime.datetime(2009, 8, 18, 0, 0), 55061.1549, -1.152), 
    ('ibcy04m6q', 'UVIS2', datetime.datetime(2009, 8, 18, 0, 0), 55061.1566, -1.20733), 
    ('ibcy05b7q', 'UVIS2', datetime.datetime(2009, 9, 5, 0, 0), 55079.91267, 2.35905), 
    ('ibcy05b9q', 'UVIS2', datetime.datetime(2009, 9, 5, 0, 0), 55079.91437, 1.84524)  
    ... displaying 10 of 1025 total bound parameter sets ...  
    ('ichl05qwq', 'UVIS2', datetime.datetime(2014, 4, 2, 0, 0), 56749.05103, -2.98), 
    ('ichl05qxq', 'UVIS2', datetime.datetime(2014, 4, 2, 0, 0), 56749.05177, -3.07))
2014-04-11 13:08:20,959 INFO sqlalchemy.engine.base.Engine COMMIT

这是我在 SQLAlchemy 类中定义的列:

class Focus(Base):
    """ORM for the table storing the focus measurement information."""
    __tablename__ = 'focus'
    id = Column(Integer(), primary_key=True)
    filename = Column(String(17), index=True, nullable=False)
    camera = Column(String(5), index=True, nullable=False)
    date = Column(Date(), index=True, nullable=False)
    mjd = Column(Float(precision=20, scale=10), index=True, nullable=False)
    focus = Column(Float(15), nullable=False)
    __table_args__ = (UniqueConstraint('filename', 'camera', 
                      name='focus_uniqueness_constraint'),)

这是我在创建表时,SQLAlchemy 记录的 SQL 语句,设置了 echo=True

CREATE TABLE focus (
    id INTEGER NOT NULL AUTO_INCREMENT, 
    filename VARCHAR(17) NOT NULL, 
    camera VARCHAR(5) NOT NULL, 
    date DATE NOT NULL, 
    mjd FLOAT(20) NOT NULL, 
    focus FLOAT(15) NOT NULL, 
    PRIMARY KEY (id), 
    CONSTRAINT focus_uniqueness_constraint UNIQUE (filename, camera)
)

到目前为止,一切正常。但这是我在 MySQL 中看到的,使用 SHOW CREATE TABLE focus;

CREATE TABLE `focus` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `filename` varchar(17) NOT NULL,
  `camera` varchar(5) NOT NULL,
  `date` date NOT NULL,
  `mjd` float NOT NULL,
  `focus` float NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `focus_uniqueness_constraint` (`filename`,`camera`),
  KEY `ix_focus_filename` (`filename`),
  KEY `ix_focus_mjd` (`mjd`),
  KEY `ix_focus_date` (`date`),
  KEY `ix_focus_camera` (`camera`)
) ENGINE=InnoDB AUTO_INCREMENT=1193 DEFAULT CHARSET=latin1

不知怎么的,FLOAT 的定义变了!这是某种 MySQL 配置设置吗?我现在只是运行在本地主机上,但如果这是一个配置设置,我担心如果继续使用浮点数,代码在生产服务器上的可移植性。我可以像在其他 StackOverflow 问题中看到的那样,切换到十进制列类型,因为我需要精确值,但我想了解这里发生了什么。


更新:为了进一步说明两位炼金术士的回答,这里是它如何改变我的查询:

> SELECT ROUND(mjd,10) FROM focus LIMIT 10;
+------------------+
| ROUND(mjd,10)    |
+------------------+
| 55046.1953125000 |
| 55046.1992187500 |
| 55054.0429687500 |
| 55054.0429687500 |
| 55061.1562500000 |
| 55061.1562500000 |
| 55079.9140625000 |
| 55079.9140625000 |
| 55106.9101562500 |
| 55106.9101562500 |
+------------------+
10 rows in set (0.00 sec)

注意所有的小数精度依然存在。我之前不知道 SELECT 会四舍五入值,但如果你考虑浮点数表示的工作原理,这似乎是有道理的。它使用为那个数字分配的完整字节,显示多少小数是任意的,最多可以到浮点数的完整长度:https://stackoverflow.com/a/20482699/1216837

指定精度似乎只影响它是存储为双精度还是单精度:http://dev.mysql.com/doc/refman/5.0/en/floating-point-types.html

但有趣/烦人的是,当我从 SQLAlchemy 层发出 SELECT 时,我也得担心同样的问题:

query = psf_session.query(Focus).first()
print query.filename, query.mjd, query.focus

结果是 bcy02blq 55046.2 1.05778,所以这些值仍然被四舍五入。再说一次,这很有道理,因为 SQLAlchemy 反正只是发出 SQL 命令。总的来说,这让我更想切换到 DECIMAL 列类型:http://dev.mysql.com/doc/refman/5.0/en/fixed-point-types.html

1 个回答

2

看起来你所有的数值都被打印成了六位数字(除了有几个地方省略了.0)。虽然我找不到相关的文档,但我猜这可能是MySQL在执行SELECT语句时显示float值的默认行为。

根据你提供的CREATE TABLE语句,内部表示是正确的,所以你只需要在你的语句中加上类似ROUND(mjd, 3)的东西,第一个参数是你想要四舍五入的字段,最后一个参数是你想要保留的小数位数(这个位数可以比现在显示的更长)。

撰写回答