我有多个从NoSQL数据库中提取的txt文件。示例半结构化文件如下所示:
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.3.1, rUnknown, Wed Dec 13 22:58:54 UTC 2017
scan 'abcd.polardim', {TIMERANGE => [0, 1583020799000]}
ROW COLUMN+CELL
abcd.polardim|1511175034223 column=i:SJ - #3, timestamp=1511175034224, value=9
abcd.polardim|1511175034223 column=i:SJ - #4, timestamp=1511175034224, value=1
abcd.polardim|1511175034223 column=i:SJ Best, timestamp=1511175034224, value=15
abcd.polardim|1511175034223 column=i:TestMoment, timestamp=1511175034224, value=jan-17
row|1518803776714 column=i:Emulate, timestamp=1518803776720, value=fa283e60-db7e-4888-80f8-2688b36c1234
row|1518803776714 column=i:CSF - #1, timestamp=1518803776720, value=0
row|1518803776714 column=i:CSF - #2, timestamp=1518803776720, value=0
row|1518803776714 column=i:CSF - #3, timestamp=1518803776720, value=0
row|1518803776714 column=i:CSF - #4, timestamp=1518803776720, value=0
row|1518803776714 column=i:CSF Best, timestamp=1518803776720, value=0
row|1518803776714 column=i:Categ, timestamp=1518803776720, value=M
row|1518803776714 column=i:Cy, timestamp=1518803776720, value=192
row|1518803776714 column=i:Comments, timestamp=1518803776720, value=0
row|1518803776714 column=i:Date, timestamp=1518803776720, value=17-2-2009
我想把它加载到一个数据框中,在=
后面的文本作为一个值加载到一个字段中。
示例输出如下所示:
column timestamp value
SJ - #3 1511175034224 9
SJ - #4 1511175034224 1
SJ Best 1511175034224 15
TestMoment 1511175034224 jan-17
Emulate 1518803776720 fa283e60-db7e-4888-80f8-2688b36c1234
CSF - #1 1518803776720 0
如何在python中实现这一点
您可以为此使用
re
模块re.finditer
这里将返回一个迭代器,该迭代器在字符串中的RE模式的所有非重叠匹配上生成匹配对象另一个选项是从
source
并使用extract
方法创建一个示例数据帧相关问题 更多 >
编程相关推荐