Python从一个url逐行下载大型csv文件，只需10个条目

#!/usr/bin/env python import requests from contextlib import closing import csv url = "https://example.com.au/catalog/food-catalog.csv" with closing(requests.get(url, stream=True)) as r: f = (line.decode('utf-8') for line in r.iter_lines()) reader = csv.reader(f, delimiter=',', quotechar='"') for row in reader: print(row)

3条回答

网友

1楼 · 编辑于 2024-04-26 23:10:28

问题不在于contextlib，而在于生成器。当您的with块结束时，连接将被关闭，相当直接。在

实际下载的部分是for row in reader:，因为reader被包装在f周围，这是一个惰性生成器。实际上，每一个Python循环都可能从内部循环中读取。在

关键是在10行之后停止循环。有几种简单的方法：

for count, row in enumerate(reader, start=1):
    print(row)

    if count == 10:
        break

或者

^{pr2}$

网友

2楼 · 编辑于 2024-04-26 23:10:28

熊猫也是一种方法：

import pandas as pd

#create a datafram from your original csv, with "," as your separator 
#and limiting the read to the first 10 rows
#here, I also configured it to also read it as UTF-8 encoded
your_csv = pd.read_csv("https://example.com.au/catalog/food-catalog.csv", sep = ',', nrows = 10, encoding = 'utf-8')

#You can now print it:
print(your_csv)

#And even save it:
your_csv.to_csv(filePath, sep = ',', encoding = 'utf-8')

网友

3楼 · 编辑于 2024-04-26 23:10:28

您可以通过制作一个生成器来概括这个想法，该生成器将在每次调用时生成下一个n行。来自itertools模块的grouper配方对于这样的事情很有用。在

import requests
import itertools
import csv
import contextlib

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return itertools.zip_longest(*args, fillvalue=fillvalue)

def stream_csv_download(chunk_size):
    url = 'https://www.stats.govt.nz/assets/Uploads/Annual-enterprise-survey/Annual-enterprise-survey-2017-financial-year-provisional/Download-data/annual-enterprise-survey-2017-financial-year-provisional-csv.csv'
    with contextlib.closing(requests.get(url, stream=True)) as stream:
        lines = (line.decode('utf-8') for line in stream.iter_lines(chunk_size))
        reader = csv.reader(lines, delimiter=',', quotechar='"')
        chunker = grouper(reader, chunk_size, None)
        while True:
            try:
                yield [line for line in next(chunker)]
            except StopIteration:
                return

csv_file = stream_csv_download(10)

这肯定会缓冲一些数据，因为调用很快，但我不认为它是在下载整个文件。我得用一个大文件来测试。在

相关问题更多 >

编程相关推荐

热门问题

热门文章