在Python中按时间序列区间分组项目
我有一些数据,看起来像这样:
[[datetime1, label1],
[datetime2, label2],
[datetime3, label3]]
这些标签是字符串。我有一个分组参数(delta),它是一个时间间隔(datetime.timedelta)。
我想要做的事情:
- 创建一组时间间隔,这些时间间隔之间的距离都是delta。换句话说,下面的时间间隔bin2 - bin1 = bin3 - bin2 = delta。
- 把这些标签放入对应的时间间隔中。
所以我最后会得到类似这样的结果:
[[datetimebin1, [label1, label2],
[datetimebin2, []],
[datetimebin3, []],
[datetimebin4, [label3]]
有人推荐我使用pandas,但我还没有找到我想要的东西。任何帮助都非常感谢!
2 个回答
3
我觉得@DrV的回答是正确的,不过我准备了一个例子,想展示如何用Pandas实现类似的功能:
import numpy
import pandas
import datetime
import time
# Binning delta
delta = datetime.timedelta(hours=1)
# Sample data
sample = [
['2014-08-09 16:30:00', 'label1'],
['2014-08-09 15:30:00', 'label2'],
['2014-08-09 14:30:00', 'label3'],
['2014-08-09 14:00:00', 'label4']
]
# Create dataframe and append UNIX timestamp column
df = pandas.DataFrame(sample)
df.columns = ['Datetime', 'Label']
df['Datetime'] = pandas.to_datetime(df['Datetime'])
df['UnixStamp'] = df['Datetime'].apply(lambda d: time.mktime(d.timetuple()))
df = df.set_index('Datetime')
# Calculate bins
bins = numpy.arange(min(df['UnixStamp']), max(df['UnixStamp']) + delta.seconds, delta.seconds)
# Group columns by datetime bin
def bin_from_tstamp(tstamp):
diffs = [abs(tstamp - bin) for bin in bins]
return bins[diffs.index(min(diffs))]
grouped = df.groupby(df['UnixStamp'].map(
lambda t: datetime.datetime.fromtimestamp(bin_from_tstamp(t))
))
到这里,grouped
已经包含了按时间段分组的数据集。
接下来是打印grouped.groups
的结果(这里的键是时间段,值是分组后的时间):
{
numpy.datetime64('2014-08-09T18:00:00.000000000+0200'): [
Timestamp('2014-08-09 16:30:00')
],
numpy.datetime64('2014-08-09T17:00:00.000000000+0200'): [
Timestamp('2014-08-09 15:30:00')
],
numpy.datetime64('2014-08-09T16:00:00.000000000+0200'): [
Timestamp('2014-08-09 14:30:00'),
Timestamp('2014-08-09 14:00:00'
]
}
2
大概可以这样做:
# data: a lists of lists (length 2) of measurements
# res: resulting list of lists
# delta: time delta
# output list (will be a list of lists, as in the question
res = []
# end of first bin:
binstart = data[0][0]
res.append([binstart, []])
# iterate through the data item
for d in data:
# if the data item belongs to this bin, append it into the bin
if d[0] < binstart + delta:
res[-1][1].append(d[1])
continue
# otherwise, create new empty bins until this data fits into a bin
binstart += delta
while d[0] > binstart + delta:
res.append([binstart, [])
binstart += delta
# create a bin with the data
res.append([binstart, [d[1]]])