使用chunksize将CSV文件从s3加载到Pandas中

1条回答

网友

1楼 · 发布于 2024-04-26 23:10:58

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html 清楚地说

filepath_or_bufferstr, path object or file-like object Any valid string path is acceptable. The string could be a URL. Valid URL schemes include http, ftp, s3, gs, and file. For file URLs, a host is expected. A local file could be: file://localhost/path/to/table.csv.
If you want to pass in a path object, pandas accepts any os.PathLike.
By file-like object, we refer to objects with a read() method, such as a file handle (e.g. via builtin open function) or StringIO.

在块中读取时，返回迭代器对象，需要对其进行迭代。。比如：

for df in pd.read_csv('s3://<<bucket-name>>/<<filename>>',chunksize = 100000):
    process df chunk..

并且如果你认为这是因为CukSead很大，你可以考虑只为一个小的块来尝试它，比如：

for df in pd.read_csv('s3://<<bucket-name>>/<<filename>>',chunksize = 1000):
    print(df.head())
    break

相关问题更多 >

编程相关推荐

热门问题

热门文章

使用chunksize将CSV文件从s3加载到Pandas中

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >