如何在python中读取大型.jl文件

2024-05-23 22:31:04 发布

您现在位置：Python中文网/ 问答频道 /正文

7284

网友

男 | 程序猿一只，喜欢编程写python代码。

我正在尝试读取以下数据集并将其转换为熊猫数据帧：
https://www.kaggle.com/marlesson/meli-data-challenge-2020

它是一个包含以下格式行的文件：

{'event_info': '...', 'event_timestamp': '...', 'event_type': '...'}
{'event_info': '...', 'event_timestamp': '...', 'event_type': '...'}
{'event_info': '...', 'event_timestamp': '...', 'event_type': '...'}

我一直在尝试以下方法，但时间太长（+60分钟）：

import numpy as np
import pandas as pd
import fileinput
import json

%%time

df = pd.DataFrame()
with fileinput.input(files='/kaggle/input/meli-data-challenge-2020/train_dataset.jl') as file:
    for line in file:
        conv = json.loads(line)
        df = df.append(conv, ignore_index=True)
df.head()

在这段代码中，它以字符串的形式逐行读取文件，将每个文件转换为json，然后将其附加到数据帧中。

有没有办法更快地将数据集转换为数据帧

Tags：文件数据 import info event json df data

1条回答

网友

1楼 · 发布于 2024-05-23 22:31:04

我试图读取的文件是一个包含多个对象的JSON文件。Pandasread_json()支持类似以下数据的lines参数：

%%time

df = pd.read_json('/kaggle/input/meli-data-challenge-2020/item_data.jl', lines=True)

Output: CPU times: user 14.1 s, sys: 3.31 s, total: 17.4 s
Wall time: 18.6 s

如何在python中读取大型.jl文件

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何在python中读取大型.jl文件

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >