Pandas数据框错误的行数

2024-05-08 22:05:14 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个相当大的日志数据json文件,我正试图将其转换为XLS或CSV。 这个过程中的某些内容只占用了前1000行,我不知道是什么导致了这个问题。在

import json
import pprint
import pandas as pd
from pandas.io.json import json_normalize

f = open('GetLog.json', 'r')
writer = pd.ExcelWriter('output.xlsx')
payload = json.load(f)
df = json_normalize(payload, 'Result')
f.close()

pprint.pprint(df)
df.to_excel(writer,'Log Output')
writer.save()
writer.close()

下面是经过稍微清理的json提取,但是可以说我只对结果感兴趣,因为消息的有效负载通常是空的。在

{"Log":{"Messages":[]},"Result":[{"logdate":"/Date(1468270785461)/","message":"ErrorText","logtype":0,"module":"WatchFolder","logdateStr":"2016/07/12 06:59:45.461"},{"logdate":"/Date(1468270785430)/","message":"ErrorText","logtype":0,"module":"WatchFolder","logdateStr":"2016/07/12 06:59:45.430"},{"logdate":"/Date(1468270785398)/","message":"ErrorText","logtype":0,"module":"WatchFolder","logdateStr":"2016/07/12 06:59:45.398"},{"logdate":"/Date(1468270785367)/","message":"ErrorText","logtype":0,"module":"WatchFolder","logdateStr":"2016/07/12 06:59:45.367"},{"logdate":"/Date(1468270785336)/","message":"ErrorText","logtype":0,"module":"WatchFolder","logdateStr":"2016/07/12 06:59:45.336"},{"logdate":"/Date(1468270785227)/","message":"ErrorText","logtype":0,"module":"WatchFolder","logdateStr":"2016/07/12 06:59:45.227"},{"logdate":"/Date(1468270785196)/","message":"ErrorText","logtype":0,"module":"WatchFolder","logdateStr":"2016/07/12 06:59:45.196"},{"logdate":"/Date(1468270785164)/","message":"ErrorText","logtype":0,"module":"WatchFolder","logdateStr":"2016/07/12 06:59:45.164"}],"success":true,"TotalCount":5648}

尝试直接将dicts本机导入pandas失败,错误为:“ValueError:将dicts与非序列混合可能导致不明确的排序。”

最后,这是一个脚本,我只想指向远程系统上的web服务,每天提取一到两次一小时的日志


Tags: importjsonmessagepandasdfdatewriterpd
1条回答
网友
1楼 · 发布于 2024-05-08 22:05:14

最终使用ijson加载json文件,只加载我想要的结果值。 示例代码如下:

import csv
import ijson
import pprint
import pandas as pd

from pandas.io.json import json_normalize

#print flattenjson(x)
#pprint.pprint
f = open('GetLog.json', 'r')
writer = pd.ExcelWriter('output.xlsx')
df = pd.DataFrame()

for item in ijson.items(f, 'Result'):
    df1 = pd.DataFrame(item)
    if df.empty:
        df = df1
    else:
        df.append(df1, ignore_index=True)
f.close()

df.to_excel(writer,'Log Output')
writer.save()
writer.close()

liveversion从服务器获取json,并使用一些参数来指定日期范围。在

相关问题 更多 >

    热门问题