可以对日期时间集合使用cut吗?

3 投票
2 回答
2841 浏览
提问于 2025-04-17 21:32

可以用 pandas.cut 来把 datetime 时间戳分成几个区间吗?

下面这段代码:

import pandas as pd
import StringIO

contenttext = """Time,Bid
2014-03-05 21:56:05:924300,1.37275
2014-03-05 21:56:05:924351,1.37272
2014-03-05 21:56:06:421906,1.37275
2014-03-05 21:56:06:421950,1.37272
2014-03-05 21:56:06:920539,1.37275
2014-03-05 21:56:06:920580,1.37272
2014-03-05 21:56:09:071981,1.37275
2014-03-05 21:56:09:072019,1.37272"""

content = StringIO.StringIO(contenttext)
df = pd.read_csv(content, header=0)
df['Time'] = pd.to_datetime(df['Time'], format='%Y-%m-%d %H:%M:%S:%f')

pd.cut(df['Time'], 5)

会出现以下错误:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-f5387a84c335> in <module>()
     16 df['Time'] = pd.to_datetime(df['Time'], format='%Y-%m-%d %H:%M:%S:%f')
     17 
---> 18 pd.cut(df['Time'], 5)

/home/???????/sites/varsite/venv/local/lib/python2.7/site-packages/pandas/tools/tile.pyc in cut(x, bins, right, labels, retbins, precision, include_lowest)
     80         else:
     81             rng = (nanops.nanmin(x), nanops.nanmax(x))
---> 82         mn, mx = [mi + 0.0 for mi in rng]
     83 
     84         if mn == mx:  # adjust end points before binning

TypeError: unsupported operand type(s) for +: 'Timestamp' and 'float'

2 个回答

2

这是我找到的一个解决办法。你可能需要稍微调整一下代码,以满足你的精度需求。下面我用日期作为例子:

# map dates to timedelta
today=dt.date.today() 

# x below is a timedelta,
# use x.value below if you need more precision
df['days']=map(lambda x : x.days, df.Time - today)

pd.cut(df.days, bins=5)

实际上,你是把 datetimedate 转换成一个数字距离的度量,然后进行切割或分组。

3

这个问题虽然老旧,但对于未来的访问者来说,我觉得有一种更清晰的方法来计算浮点数的时间差,以便使用切割功能:

import pandas as pd
import datetime as dt

# Get Days Since Date
today = dt.date.today()
df['days ago'] = (today - df['time']).dt.days

# Get Seconds Since Datetime
now = dt.datetime.now()
df['seconds ago'] = (now - df['time']).dt.seconds

# Minutes Since Datetime
# (no dt.minutes attribute, so we use seconds/60)
now = dt.datetime.now()
df['minutes ago'] = (now - df['times']).dt.seconds/60

现在所有这些列都是浮点数值,我们可以在这些值上使用 pd.cut()

撰写回答