计算用户会话的数量，定义为间隔

SessionID, UserID, Logon_time, Logoff_time Adx1YiRyvOFApQiniyPWYPo,AbO6vW58ta1Bgrqs.RA0uHg,2016-01-05 07:46:56.180,2016-01-05 08:04:36.057 AfjMzw8In8RDqK6jIfItZPs,Ae8qOxLzozJHrC2pr2dOw88,2016-01-04 14:48:47.183,2016-01-04 14:53:30.210 AYIdSJYsRw5PptkFfEOXPa0,AX3Xy8dRDBRAlhyy3YaWw6U,2016-01-04 11:06:37.040,2016-01-04 16:34:38.770 Ac.WXBBSl75KqEuBmNljYPE,Ae8qOxLzozJHrC2pr2dOw88,2016-01-04 10:58:04.227,2016-01-04 11:21:10.520 AekXRDR3mBBDh49IIN2HdU8,Ae8qOxLzozJHrC2pr2dOw88,2016-01-04 10:16:08.040,2016-01-04 10:34:20.523 AVvL3VSWSq5Fr.f4733X.T4,AX3Xy8dRDBRAlhyy3YaWw6U,2016-01-04 09:19:29.773,2016-01-04 09:40:25.157

2条回答

网友

1楼 · 编辑于 2024-05-28 18:17:31

IIUC，我们可以这样做：

df.apply(lambda x: pd.Series([1] * len(pd.date_range(x.Logon_time, x.Logoff_time, freq='T')), 
         index=pd.date_range(x.Logon_time, x.Logoff_time, freq='T')), axis=1)\
  .stack().reset_index(level=0, drop=True).resample('T').count()

输出（压头）：

^{pr2}$

使用Pandas可视化检查所有数据：

df.apply(lambda x: pd.Series([1] * len(pd.date_range(x.Logon_time, x.Logoff_time, freq='T')),
                             index=pd.date_range(x.Logon_time, x.Logoff_time, freq='T')), axis=1)\
  .stack().reset_index(level=0, drop=True).resample('T').count().plot()

网友

2楼 · 编辑于 2024-05-28 18:17:31

最后，我使用了与Scott的答案稍有不同的解决方案，但他的方法很关键，因为观察（记录）的数量相对较少，而另一方面，考虑到第一次和最后一次观察之间的时间，时间元素的数量（例如秒数，取决于所需的分辨率）要大得多。在

但是，我首先将所有生成的日期范围（序列）收集到一个列表中，并在第二个单独的步骤中将所有这些日期范围（序列）串联起来，这将更快地使用apply()修改原始数据帧。在

# Expand the datetime range, creating records according to the given resolution (e.g. minutes).
# This creates a Series object for each session. All of those Series objects are then added to a list
# in order to concatenate them in 1 go, which is more efficient.
sessions=[]

for key, cols in df_sessions.iterrows():
    sess = pd.Series(data=pd.date_range(start=cols['logon'].floor('T'),
                                        end=cols['logoff'].ceil('T'),
                                        freq='T'),
                     name='sess_dt')
    sessions.append(sess)

# Concatenate all Series objects and convert to a DataFrame
df_sessions_2 = pd.DataFrame(pd.Series().append(sessions, ignore_index=True), columns=['ref_dt'])

# Add a counter which we can use to aggregate
df_sessions_2['sess_cnt'] = 1

# Aggregate according to the datetime
df_sessions_2 = df_sessions_2.groupby('ref_dt').sum()

绘图只需要一个额外的语句：

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章