脚本下载数据文件，但我无法停止脚本

问题

我遇到的问题是有时有400多个非常大的文件，我不能同时下载它们。我将按Ctrl-C，但它似乎取消了cURL下载，而不是脚本，因此我需要逐个取消所有下载。有办法吗？也许不知何故，我可以在当前下载结束时停止执行一个键命令？在

#!/usr/bin/python import os import urllib2 import re import timeit filenames = [] savedir = "/Users/someguy/Documents/Research/VLF_Hissler/Data/" #connect to a URL website = urllib2.urlopen("http://somewebsite") #read html code html = website.read() #use re.findall to get all the data files filenames = re.findall('SP.*?\.mat', html) #The following chunk of code checks to see if the files are already downloaded and deletes them from the download queue if they are. count = 0 countpass = 0 for files in os.listdir(savedir): if files.endswith(".mat"): try: filenames.remove(files) count += 1 except ValueError: countpass += 1 print "counted number of removes", count print "counted number of failed removes", countpass print "number files less removed:", len(filenames) #saves the file names into an array of html link links=len(filenames)*[0] for j in range(len(filenames)): links[j] = 'http://somewebsite.edu/public_web_junk/southpole/2014/'+filenames[j] for i in range(len(links)): os.system("curl -o "+ filenames[i] + " " + str(links[i])) print "links downloaded:",len(links)

1条回答

网友

1楼 · 发布于 2024-04-25 12:10:25

在下载之前，您可以使用curl检查文件大小：

import subprocess, sys

def get_file_size(url):
    """
    Gets the file size of a URL using curl.

    @param url: The URL to obtain information about.

    @return: The file size, as an integer, in bytes.
    """

    # Get the file size in bytes
    p = subprocess.Popen(('curl', '-sI', url), stdout=subprocess.PIPE)
    for s in p.stdout.readlines():
        if 'Content-Length' in s:
            file_size = int(s.strip().split()[-1])
    return file_size

# Your configuration parameters
url      = ... # URL that you want to download
max_size = ... # Max file size in bytes

# Now you can do a simple check to see if the file size is too big
if get_file_size(url) > max_size:
    sys.exit()

# Or you could do something more advanced
bytes = get_file_size(url)
if bytes > max_size:
    s = raw_input('File is {0} bytes. Do you wish to download? '
        '(yes, no) '.format(bytes))
    if s.lower() == 'yes':
        # Add download code here....
    else:
        sys.exit()

代码说明

问题

相关问题更多 >

编程相关推荐

热门问题

热门文章