脚本下载数据文件,但我无法停止脚本

2024-04-25 12:10:25 发布

您现在位置:Python中文网/ 问答频道 /正文

代码说明

我下面的脚本很好用。它基本上就是从一个给定的网站找到我感兴趣的所有数据文件,检查它们是否已经在我的计算机上(如果有的话跳过它们),最后用cURL把它们下载到我的计算机上。在

问题

我遇到的问题是有时有400多个非常大的文件,我不能同时下载它们。我将按Ctrl-C,但它似乎取消了cURL下载,而不是脚本,因此我需要逐个取消所有下载。有办法吗?也许不知何故,我可以在当前下载结束时停止执行一个键命令?在

#!/usr/bin/python
import os
import urllib2
import re
import timeit

filenames = []
savedir = "/Users/someguy/Documents/Research/VLF_Hissler/Data/"

#connect to a URL
website = urllib2.urlopen("http://somewebsite")

#read html code
html = website.read()

#use re.findall to get all the data files
filenames = re.findall('SP.*?\.mat', html)

#The following chunk of code checks to see if the files are already downloaded and deletes them from the download queue if they are.
count = 0
countpass = 0
for files in os.listdir(savedir):
   if files.endswith(".mat"):
      try:
         filenames.remove(files)
         count += 1
      except ValueError:
         countpass += 1

print "counted number of removes", count
print "counted number of failed removes", countpass
print "number files less removed:", len(filenames)

#saves the file names into an array of html link
links=len(filenames)*[0]

for j in range(len(filenames)):
   links[j] = 'http://somewebsite.edu/public_web_junk/southpole/2014/'+filenames[j]

for i in range(len(links)):
   os.system("curl -o "+ filenames[i] + " " + str(links[i]))

print "links downloaded:",len(links)

Tags: ofthetoimportrelenifos
1条回答
网友
1楼 · 发布于 2024-04-25 12:10:25

在下载之前,您可以使用curl检查文件大小:

import subprocess, sys

def get_file_size(url):
    """
    Gets the file size of a URL using curl.

    @param url: The URL to obtain information about.

    @return: The file size, as an integer, in bytes.
    """

    # Get the file size in bytes
    p = subprocess.Popen(('curl', '-sI', url), stdout=subprocess.PIPE)
    for s in p.stdout.readlines():
        if 'Content-Length' in s:
            file_size = int(s.strip().split()[-1])
    return file_size

# Your configuration parameters
url      = ... # URL that you want to download
max_size = ... # Max file size in bytes

# Now you can do a simple check to see if the file size is too big
if get_file_size(url) > max_size:
    sys.exit()

# Or you could do something more advanced
bytes = get_file_size(url)
if bytes > max_size:
    s = raw_input('File is {0} bytes. Do you wish to download? '
        '(yes, no) '.format(bytes))
    if s.lower() == 'yes':
        # Add download code here....
    else:
        sys.exit()

相关问题 更多 >

    热门问题