在考虑单独时间字段的情况下选择两个日期之间的日期

0 投票

1 回答

548 浏览

数据工程师

提问于 2025-04-17 15:44

我在Postgresql里有一个日期和一个时间字段。我在用Python读取这些数据时，需要根据某些日期和时间来筛选信息。

大致的步骤是这样的：

从表x中选择所有日期晚于某个特定日期的记录。
在这些记录中，进一步筛选出那些时间晚于该日期给定的时间的记录。
同时，第二个日期（date2）必须早于另一个特定日期（monthdayyear2），并且时间（time2）也必须早于该日期给定的时间。

我知道可以用Python的一些方法来实现，比如逐条遍历结果等等。但我在想，是否有比这种简单粗暴的方法更好的选择？如果可以的话，我希望能避免多次查询，或者在fetchall()中处理很多额外的结果。

数据处理 postgresql 数据库查询查询优化条件查询记录筛选日期筛选时间字段

1 个回答

如果我理解你的设计没错，这其实是一个模式设计的问题。你应该用：

CREATE TABLE sometable (
    date1 date,
    time1 time,
    date2 date,
    time2 time
);

而不是：

CREATE TABLE sometable (
    timestamp1 timestamp with time zone,
    timestamp2 timestamp with time zone
);

如果你想让时间戳自动转换为UTC时间，然后再转换回客户端的TimeZone，那么你应该使用timestamp without time zone，如果你想存储原始的时间戳而不进行时区转换。

如果可以接受包含的测试，你可以这样写：

SELECT ...
FROM sometable 
WHERE '2012-01-01 11:15 +0800' BETWEEN timestamp1 AND timestamp2;

如果你不能修改你的模式，最好的办法是这样：

SELECT ...
FROM sometable
WHERE '2012-01-01 11:15 +0800' BETWEEN (date1 + time1) AND (date2 + time2);

在处理多个时区的客户端时，这可能会有一些意想不到的情况；你可能需要使用AT TIME ZONE这个操作符。

如果你需要在某一侧进行排除测试，或者在另一侧，你不能使用BETWEEN，因为它是a <= x <= b的操作符。你应该这样写：

SELECT ...
FROM sometable
WHERE '2012-01-01 11:15 +0800' > (date1 + time1)
  AND '2012-01-01 11:15 +0800' < (date2 + time2);

自动化模式更改

自动化模式更改是可能的。

你需要查询INFORMATION_SCHEMA或pg_catalog.pg_class和pg_catalog.pg_attribute，找到那些有date和time列对的表，然后生成一组ALTER TABLE命令来统一它们。

什么是“对”是相当具体于应用的；如果你使用了一致的命名规则，使用LIKE或~操作符和/或regexp_matches应该很容易做到。你想生成一组(tablename, datecolumnname, timecolumnname)的元组。

一旦你有了这些，你可以为每个(tablename, datecolumnname, timecolumnname)元组生成以下ALTER TABLE语句，这些语句必须在事务中运行以确保安全，并且在对任何重要数据使用之前应该进行测试，其中[brackets]中的内容是替换项：

BEGIN;
ALTER TABLE [tablename] ADD COLUMN [timestampcolumnname] TIMESTAMP WITH TIME ZONE;
--
-- WARNING: This part can lose data; if one of the columns is null and the other one isn't
-- the result is null. You should've had a CHECK constraint preventing that, but probably
-- didn't. You might need to special case that; the `coalesce` and `nullif` functions and
-- the `CASE` clause might be useful if so.
--
UPDATE [tablename] SET [timestampcolumnname] = ([datecolumnname] + [timecolumnname]);
ALTER TABLE [tablename] DROP COLUMN [datecolumnname];
ALTER TABLE [tablename] DROP COLUMN [timecolumnname];
-- Finally, if the originals were NOT NULL:
ALTER TABLE [tablename] ALTER COLUMN [timestampcolumnname] SET NOT NULL;

然后检查结果，如果满意就COMMIT。请注意，从第一个ALTER开始，表上会被加上独占锁，所以在你COMMIT或ROLLBACK之前，其他任何操作都无法使用这个表。

如果你使用的是比较现代的PostgreSQL，你可以用这个format函数生成SQL；在旧版本中，你可以使用字符串连接（||）和quote_literal函数。示例：

给定示例数据：

CREATE TABLE sometable(date1 date not null, time1 time not null, date2 date not null, time2 time not null);
INSERT INTO sometable(date1,time1,date2,time2) VALUES
('2012-01-01','11:15','2012-02-03','04:00');

CREATE TABLE othertable(somedate date, sometime time);
INSERT INTO othertable(somedate, sometime) VALUES
(NULL, NULL),
(NULL, '11:15'),
('2012-03-08',NULL),
('2014-09-18','23:12');

这是一个生成输入数据集的查询。注意，它依赖于命名约定，即匹配的列对在去掉任何date或time字后总是有一个共同的名字。你也可以通过测试c1.attnum + 1 = c2.attnum来使用邻接。

BEGIN;

WITH 
-- Create set of each date/time column along with its table name, oids, and not null flag
cols AS (
    select attrelid, relname, attname, typname, atttypid, attnotnull 
    from pg_attribute 
    inner join pg_class on pg_attribute.attrelid = pg_class.oid 
    inner join pg_type on pg_attribute.atttypid = pg_type.oid 
    where NOT attisdropped AND atttypid IN ('date'::regtype, 'time'::regtype)
),
-- Self join the time and date column set, filtering the left side for only dates and
-- the right side for only times, producing two distinct sets. Then filter for entries
-- where the names are the same after replacing any appearance of the word `date` or
-- `time`.
tableinfo (tablename, datecolumnname, timecolumnname, nonnull, hastimezone) AS (
    SELECT 
        c1.relname, c1.attname, c2.attname, 
        c1.attnotnull AND c2.attnotnull AS nonnull, 
        't'::boolean AS withtimezone
    FROM cols c1 
    INNER JOIN cols c2 ON (
        c1.atttypid = 'date'::regtype 
        AND c2.atttypid = 'time'::regtype 
        AND c1.attrelid = c2.attrelid
        -- Match column pairs; I used name matching, you might use adjancency:
        AND replace(c1.attname,'date','') = replace(c2.attname,'time','')
    )
)
-- Finally, format the results into a series of ALTER TABLE statements.
SELECT format($$
    ALTER TABLE %1$I ADD COLUMN %4$I TIMESTAMP %5$s;
    UPDATE %1$I SET %4$I = (%2$I + %3$I);
    ALTER TABLE %1$I DROP COLUMN %2$I;
    ALTER TABLE %1$I DROP COLUMN %3$I;
$$ || 
    -- Append a clause to make the column NOT NULL now that it's populated, only
    -- if the original date or time were NOT NULL:
    CASE 
       WHEN nonnull
       THEN '    ALTER TABLE %1$I ALTER COLUMN %4$I SET NOT NULL;'
       ELSE ''
    END,

    -- Now the format arguments
    tablename,           -- 1
    datecolumnname,      -- 2
    timecolumnname,      -- 3
    -- You'd use a better column name generator than this simple example:
    datecolumnname||'_'||timecolumnname,  -- 4
    CASE 
       WHEN hastimezone THEN 'WITH TIME ZONE' 
       ELSE 'WITHOUT TIME ZONE' 
    END                  -- 5
)
FROM tableinfo;

你可以读取结果并在第二个会话中将其作为SQL命令发送，或者如果你想更复杂一些，可以写一个相对简单的PL/PgSQL函数，循环遍历结果并执行每一个。查询的输出类似于：

    ALTER TABLE sometable ADD COLUMN date1_time1 TIMESTAMP WITH TIME ZONE;
    UPDATE sometable SET date1_time1 = (date1 + time1);
    ALTER TABLE sometable DROP COLUMN date1;
    ALTER TABLE sometable DROP COLUMN time1;
    ALTER TABLE sometable ALTER COLUMN date1_time1 SET NOT NULL;

    ALTER TABLE sometable ADD COLUMN date2_time2 TIMESTAMP WITH TIME ZONE;
    UPDATE sometable SET date2_time2 = (date2 + time2);
    ALTER TABLE sometable DROP COLUMN date2;
    ALTER TABLE sometable DROP COLUMN time2;
    ALTER TABLE sometable ALTER COLUMN date2_time2 SET NOT NULL;

    ALTER TABLE othertable ADD COLUMN somedate_sometime TIMESTAMP WITHOUT TIME ZONE;
    UPDATE othertable SET somedate_sometime = (somedate + sometime);
    ALTER TABLE othertable DROP COLUMN somedate;
    ALTER TABLE othertable DROP COLUMN sometime;

我不知道有没有什么有用的方法来逐列判断你想要WITH TIME ZONE还是WITHOUT TIME ZONE。很可能你最终会硬编码这个，那样你可以直接删除那一列。我把它放在这里是以防在你的应用中有好的方法来找出这个。

如果你有时间可以为null但日期不能为null，或者反之的情况，你需要将日期和时间包裹在一个表达式中，以决定在null时返回什么结果。nullif和coalesce函数对此很有用，CASE也是。记住，添加一个null和一个非null值会产生一个null结果，所以你可能不需要做什么特别的处理。

如果你使用模式，你可能需要进一步细化查询，使用%I替换模式名称前缀以消除歧义。如果你不使用模式（如果你不知道什么是模式，那就说明你不使用），那么这就无所谓了。

考虑在完成这些后添加CHECK约束，确保time1小于或等于time2，在你的应用中有意义的地方。同时查看文档中的排除约束。

回答于 2025-04-17 由 Python大师

分享举报

在考虑单独时间字段的情况下选择两个日期之间的日期

1 个回答

自动化模式更改

撰写回答