考虑一个典型的实时聊天数据如下:
Peter (08:16):
Hi
What's up?
;-D
Anji Juo (09:13):
Hey, I'm using WhatsApp!
Peter (11:17):
Could you please tell me where is the feedback?
Anji Juo (19:13):
I don't know where it is.
Anji Juo (19:14):
Do you by any chance know where I can catch a taxi ?
🙏🙏🙏
要将这个原始文本文件转换为数据帧,我需要编写一些正则表达式来标识列名,然后提取相应的值
请参阅https://regex101.com/r/X3ubqF/1
Index(time) Name Message
08:16 Peter Hi
What's up?
;-D
09:13 Anji Juo Hey, I'm using WhatsApp!
11:17 Peter Could you please tell me where is the feedback?
19:13 Anji Juo I don't know where it is.
19:14 Anji Juo Do you by any chance know where I can catch a taxi ?
🙏🙏🙏
regexr"(?P<Name>.*?)\s*\((?P<Index>(?:\d|[01]\d|2[0-3]):[0-5]\d)\)"
可以完美地提取时间和名称列的值,但我不知道如何为每个时间索引突出显示和提取来自特定发送者的消息
使用
见regex proof
Python code:
结果:
解释
您可以使用
re
模块来解析字符串(regex101):印刷品:
相关问题 更多 >
编程相关推荐