从HTTP服务器下载HDF5文件的Pythonic方式是什么？

0 投票

1 回答

2502 浏览

提问于 2025-04-17 16:20

我正在尝试从http服务器下载一个hdf5文件。我可以用python的subprocess模块和wget来做到这一点，但我觉得这样有点不太正当。

    # wget solution
    import subprocess
    url = 'http://url/to/file.h5' 
    subprocess(['wget', '--proxy=off', url])

我也可以使用urllib和request模块来下载图片，像这样：

    # requests solution
    url2 = 'http://url/to/image.png'
    r = requests.get(url2)
    with open('image.png', 'wb') as img:
    img.write(r.content)

    # urllib solution
    urllib.urlretrieve(url2, 'outfile.png')

但是，当我用这种方法下载hdf5文件并运行shell命令'file'时，我得到了：

    >file test.h5 
    >test.h5: HTML document, ASCII text, with very long lines

这是来自requests.get()的头部信息（不确定这是否有帮助）

    {'accept-ranges': 'bytes',
    'content-length': '413399',
    'date': 'Tue, 19 Feb 2013 08:51:06 GMT',
    'etag': 'W/"413399-1361177055000"',
    'last-modified': 'Mon, 18 Feb 2013 08:44:15 GMT',
    'server': 'Apache-Coyote/1.1'}

我应该通过subprocess使用wget，还是有更符合python风格的解决方案？

解决方案：问题出在我没有在尝试下载文件之前关闭代理，因此传输被拦截了。这段代码解决了这个问题。

    import urllib2
    proxy_handler = urllib2.ProxyHandler({})
    opener = urllib2.build_opener(proxy_handler)
    urllib2.install_opener(opener)

    url = 'http://url/to/file.h5'

    req = urllib2.Request(url)
    r = opener.open(req)
    result = r.read()

    with open('my_file.h5', 'wb') as f:
        f.write(result)

subprocess http wget urllib file transfer requests hdf5 data-download

1 个回答

试着用 urllib.geturl 来获取真实的链接（顺着重定向走），然后把这个链接传给 urlretrieve。

回答于 2025-04-17 由 Python大师

分享举报

从HTTP服务器下载HDF5文件的Pythonic方式是什么？

1 个回答

撰写回答