如何在满足条件的特定索引处拆分pandas数据帧。

2024-04-27 05:07:11 发布

您现在位置:Python中文网/ 问答频道 /正文

我尝试导入一个chatlog的csv文件,首先按日期分割,然后当两个连续行之间的条件满足时,将每天的聊天分成多个块。在

然后我想把所有这些放在字典中,其中key是日期,value是日期块的列表。在

这就是我目前所做的。在

import pandas as pd
from datetime import datetime

# import csv of chatlog
ktlk_csv = pd.read_csv(r'''C:\Users\Jaepil\PycharmProjects\test_pycharm/5years.csv''', encoding="utf-8")

df = pd.DataFrame(ktlk_csv)

# Date column is str type. Change it into timestamp so I can later calculate diff between two rows. 
df["Date"] = pd.to_datetime(df["Date"])

# criteria to separate chunks. 
chunk_tolerance = 900 # chat stopped more than 900 seconds
chunk_min = 5 # chat less than 5 lines is not a chunk. 

# First split the entire chat by day and put it in a list. 
df_byDate = []
for group in df.groupby(lambda x: df["Date"][x].day):
    df_byDate.append(group)

df["time_diff"] = df["Date"].diff()

我在想(伪代码)

^{pr2}$

所以结果看起来像

> chatChunks_byDate = { "Dec 12": [list of chunks on that day], "Dec 14": [list of chunks on that day] ....}

我试着打印上面的一些东西来解决这个问题,但是:

print(df.columns)

> Index(['Date', 'User', 'Message', 'time_diff'], dtype='object')

我可以看到“time_diff”列创建成功。在

print(type(df_byDate[0]))

> class 'tuple'

但为什么是元组呢?我希望它是一个数据帧。在

print(df_byDate[0])

>> 
(5,                    Date User                                   Message
0   2017-09-05 19:25:46  권문광                      권문광 invited 전은영 and.
1   2017-09-05 19:25:47  권문광                                   졸사찍자 졸사
2   2017-09-05 19:29:16  전은영                            ㅌㅌㅌㅌㅌㅌㅋㅋㅋ
.
.
.

元组的[0]处的5是什么?[1] 似乎是我要查找的数据帧,但[0]处的值是什么?在

很多事情让我困惑。在


Tags: ofcsvimportdfdatetimedatetimechat