如何在Python2.7中获得真正的文件url？

In [51]: response = requests.get('http://www.vbb.de/de/datei/GTFS_VBB_Nov2015_Dez2016.zip') ...: if response.history: ...: print "Request was redirected" ...: for resp in response.history: ...: print resp.status_code, resp.url ...: print "Final destination:" ...: print response.status_code, response.url ...: else: ...: print "Request was not redirected" ...: Request was not redirected

2条回答

网友

1楼 · 编辑于 2024-06-02 04:24:46

您可以使用BeautifulSoup来读取HTML页面标题中的meta标记并获得重定向URL

>>> import requests
>>> from bs4 import BeautifulSoup
>>> a = requests.get("http://www.vbb.de/de/datei/GTFS_VBB_Nov2015_Dez2016.zip")
>>> soup = BeautifulSoup(a.text, 'html.parser')
>>> soup.find_all('meta', attrs={'http-equiv': lambda x:x.lower() == 'refresh'})[0]['content'].split('URL=')[1]
'/de/download/GTFS_VBB_Nov2015_Dez2016.zip'

此URL将相对于原始URL的域，使新URL http://www.vbb.de/de/download/GTFS_VBB_Nov2015_Dez2016.zip。下载此文件会为我下载ZIP文件：

^{pr2}$

 $ unzip -l test.zip
Archive:  test.zip
  Length      Date    Time    Name
    -          -     
     5554  2015-11-20 15:17   agency.txt
  2151517  2015-11-20 15:17   calendar_dates.txt
    71731  2015-11-20 15:17   calendar.txt
    65424  2015-11-20 15:17   routes.txt
   816498  2015-11-20 15:17   stops.txt
196020096  2015-11-20 15:17   stop_times.txt
   365499  2015-11-20 15:17   transfers.txt
 11765292  2015-11-20 15:17   trips.txt
      113  2015-11-20 15:17   logging
    -                        -
211261724                     9 files

在此重定向中，返回301状态：

>>> a.history
[<Response [301]>]
>>> a
<Response [200]>
>>> a.history[0]
<Response [301]>
>>> a.history[0].url
'http://www.vbb.de/de/download/GTFS_VBB_Nov2015_Dez2016.zip'
>>> a.url
'http://images.vbb.de/assets/ftp/file/286316.zip'

网友

2楼 · 编辑于 2024-06-02 04:24:46

首先需要通过解析第一个返回的HTML中的新的window.location.href来手动执行重定向。然后创建一个301回复，其中包含返回的Location头中包含的目标文件的名称：

import requests
import re
import os

base_url = 'http://www.vbb.de'
response = requests.get(base_url + '/de/datei/GTFS_VBB_Nov2015_Dez2016.zip')
manual_redirect = base_url + re.findall('window.location.href\s+=\s+"(.*?)"', response.text)[0]
response = requests.get(manual_redirect, stream=True)
target_filename = response.history[0].headers['Location'].split('/')[-1]

print "Downloading: '{}'".format(target_filename)
with open(target_filename, 'wb') as f_zip:
    for chunk in response.iter_content(chunk_size=1024):
        f_zip.write(chunk)

这将显示：

^{pr2}$

并生成一个29464299字节的zip文件。在

相关问题更多 >

编程相关推荐

热门问题

热门文章