使用select with terms on DateTimeIndex从HDFStore检索Pandas DataFrame时缺少一个值

2024-06-02 06:28:42 发布

您现在位置:Python中文网/ 问答频道 /正文

我尝试使用Pandas从HDFStore中检索存储的数据,使用select和terms。返回一个不带简单项的select data()。但是,当我试图根据DateTimeIndex过滤数据时,除了最后一行之外,所有内容都会返回。在

我怀疑时间戳是如何在内部存储的,以及它们的精确性有点可疑,但我不明白为什么它不起作用,也不知道我能做些什么。任何建议都会有帮助,因为我在这方面是个新手。在

我创建了一个小的“单元测试”来调查。。。在

import os
import tempfile
import uuid
import pandas as pd
import numpy as np
import time
import unittest
import sys


class PandasTestCase(unittest.TestCase):
    def setUp(self):
        print "Pandas version: {0}".format(pd.version.version)
        print "Python version: {0}".format(sys.version)
        self._filename = os.path.join(tempfile.gettempdir(), '{0}.{1}'.format(str(uuid.uuid4()), 'h5'))
        self._store = pd.HDFStore(self._filename)

    def tearDown(self):
        self._store.close()
        if os.path.isfile(self._filename):
            os.remove(self._filename)

    def test_filtering(self):
        t_start = time.time() * 1e+9
        t_end = t_start + 1e+9 # 1 second later, i.e. 10^9 ns
        sample_count = 1000

        timestamps = np.linspace(t_start, t_end, num=sample_count).tolist()
        data = {'channel_a': range(sample_count)}

        time_index = pd.to_datetime(timestamps, utc=True, unit='ns')
        df = pd.DataFrame(data, index=time_index, dtype=long)

        key = 'test'
        self._store.append(key, df)

        retrieved_df = self._store.select(key)
        retrieved_timestamps = np.array(retrieved_df.index.values, dtype=np.uint64).tolist()
        print "Retrieved {0} timestamps, w/o filter.".format(len(retrieved_timestamps))

        self.assertItemsEqual(retrieved_timestamps, timestamps)

        stored_time_index = self._store[key].index

        # Create a filter based on first and last values of index, i.e. from <= index <= to.
        from_filter = pd.Term('index>={0}'.format(pd.to_datetime(stored_time_index[0], utc=True, unit='ns')))
        to_filter = pd.Term('index<={0}'.format(pd.to_datetime(stored_time_index[-1], utc=True, unit='ns')))

        retrieved_df_interval = self._store.select(key, [from_filter, to_filter])
        retrieved_timestamps_interval = np.array(retrieved_df_interval.index.values, dtype=np.uint64).tolist()
        print "Retrieved {0} timestamps, using filter".format(len(retrieved_timestamps_interval))

        self.assertItemsEqual(retrieved_timestamps_interval, timestamps)


if __name__ == '__main__':
    unittest.main()

。。。其输出如下:

^{pr2}$

更新:在使用替代构造函数修改术语的创建之后,一切正常。是这样的:

    # Create a filter based on first and last values of index, i.e. from <= index <= to.
    #from_filter = pd.Term('index>={0}'.format(pd.to_datetime(stored_time_index[0], utc=True, unit='ns')))
    from_filter = pd.Term('index','>=', stored_time_index[0])
    #to_filter = pd.Term('index<={0}'.format(pd.to_datetime(stored_time_index[-1], utc=True, unit='ns')))
    to_filter = pd.Term('index','<=', stored_time_index[-1])

Tags: tostoreimportselfformatdfindextime
1条回答
网友
1楼 · 发布于 2024-06-02 06:28:42

时间戳上的字符串格式默认为6个小数位(这就是您在术语中设置格式的原因)

n是9个位置,使用术语构造器的替代形式

Term("index","<=",stamp)

下面是一个例子

^{pr2}$

注意,在0.13中(本例使用master),这将更加容易(并且您可以直接包含它,比如:'index<=index[-1]'(表达式rhs上的索引实际上是局部变量索引

相关问题 更多 >