我尝试导入一个chatlog的csv文件,首先按日期分割,然后当两个连续行之间的条件满足时,将每天的聊天分成多个块。在
然后我想把所有这些放在字典中,其中key是日期,value是日期块的列表。在
这就是我目前所做的。在
import pandas as pd
from datetime import datetime
# import csv of chatlog
ktlk_csv = pd.read_csv(r'''C:\Users\Jaepil\PycharmProjects\test_pycharm/5years.csv''', encoding="utf-8")
df = pd.DataFrame(ktlk_csv)
# Date column is str type. Change it into timestamp so I can later calculate diff between two rows.
df["Date"] = pd.to_datetime(df["Date"])
# criteria to separate chunks.
chunk_tolerance = 900 # chat stopped more than 900 seconds
chunk_min = 5 # chat less than 5 lines is not a chunk.
# First split the entire chat by day and put it in a list.
df_byDate = []
for group in df.groupby(lambda x: df["Date"][x].day):
df_byDate.append(group)
df["time_diff"] = df["Date"].diff()
我在想(伪代码)
^{pr2}$所以结果看起来像
> chatChunks_byDate = { "Dec 12": [list of chunks on that day], "Dec 14": [list of chunks on that day] ....}
我试着打印上面的一些东西来解决这个问题,但是:
print(df.columns)
> Index(['Date', 'User', 'Message', 'time_diff'], dtype='object')
我可以看到“time_diff”列创建成功。在
print(type(df_byDate[0]))
> class 'tuple'
但为什么是元组呢?我希望它是一个数据帧。在
print(df_byDate[0])
>>
(5, Date User Message
0 2017-09-05 19:25:46 권문광 권문광 invited 전은영 and.
1 2017-09-05 19:25:47 권문광 졸사찍자 졸사
2 2017-09-05 19:29:16 전은영 ㅌㅌㅌㅌㅌㅌㅋㅋㅋ
.
.
.
元组的[0]处的5是什么?[1] 似乎是我要查找的数据帧,但[0]处的值是什么?在
很多事情让我困惑。在
目前没有回答
相关问题 更多 >
编程相关推荐