Python从一个url逐行下载大型csv文件,只需10个条目

2024-04-26 23:10:28 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个大的csv文件的客户端和共享通过一个网址下载,我想下载它逐行字节,我只想限制10个条目。在

我有下面的代码,将下载文件,但我想在这里只下载文件的前10个条目,我不想完整的文件。在

#!/usr/bin/env python
import requests
from contextlib import closing
import csv

url = "https://example.com.au/catalog/food-catalog.csv"

with closing(requests.get(url, stream=True)) as r:
    f = (line.decode('utf-8') for line in r.iter_lines())
    reader = csv.reader(f, delimiter=',', quotechar='"')
    for row in reader:
        print(row)

我不太了解contextlib,它将如何与Python中的with一起工作。在

有谁能在这里帮我,这将是非常有帮助,并提前感谢。在


Tags: 文件csvinimporturlforwithline
3条回答

问题不在于contextlib,而在于生成器。当您的with块结束时,连接将被关闭,相当直接。在

实际下载的部分是for row in reader:,因为reader被包装在f周围,这是一个惰性生成器。实际上,每一个Python循环都可能从内部循环中读取。在

关键是在10行之后停止循环。有几种简单的方法:

for count, row in enumerate(reader, start=1):
    print(row)

    if count == 10:
        break

或者

^{pr2}$

熊猫也是一种方法:

import pandas as pd

#create a datafram from your original csv, with "," as your separator 
#and limiting the read to the first 10 rows
#here, I also configured it to also read it as UTF-8 encoded
your_csv = pd.read_csv("https://example.com.au/catalog/food-catalog.csv", sep = ',', nrows = 10, encoding = 'utf-8')

#You can now print it:
print(your_csv)

#And even save it:
your_csv.to_csv(filePath, sep = ',', encoding = 'utf-8')

您可以通过制作一个生成器来概括这个想法,该生成器将在每次调用时生成下一个n行。来自itertools模块的grouper配方对于这样的事情很有用。在

import requests
import itertools
import csv
import contextlib

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return itertools.zip_longest(*args, fillvalue=fillvalue)

def stream_csv_download(chunk_size):
    url = 'https://www.stats.govt.nz/assets/Uploads/Annual-enterprise-survey/Annual-enterprise-survey-2017-financial-year-provisional/Download-data/annual-enterprise-survey-2017-financial-year-provisional-csv.csv'
    with contextlib.closing(requests.get(url, stream=True)) as stream:
        lines = (line.decode('utf-8') for line in stream.iter_lines(chunk_size))
        reader = csv.reader(lines, delimiter=',', quotechar='"')
        chunker = grouper(reader, chunk_size, None)
        while True:
            try:
                yield [line for line in next(chunker)]
            except StopIteration:
                return

csv_file = stream_csv_download(10)

这肯定会缓冲一些数据,因为调用很快,但我不认为它是在下载整个文件。我得用一个大文件来测试。在

相关问题 更多 >