如何使用Python还原缩短的URL？

12 投票

5 回答

9315 浏览

提问于 2025-04-17 00:08

我对之前的解决方案（使用 unshort.me 的 API）有些困扰，因为我主要想还原 YouTube 的链接。由于 unshort.me 被广泛使用，几乎有 90% 的结果都会出现验证码，而我无法解决这些验证码。

到目前为止，我只能使用：

def unshorten_url(url):
    resolvedURL = urllib2.urlopen(url)  
    print resolvedURL.url

    #t = Test()
    #c = pycurl.Curl()
    #c.setopt(c.URL, 'http://api.unshort.me/?r=%s&t=xml' % (url))
    #c.setopt(c.WRITEFUNCTION, t.body_callback)
    #c.perform()
    #c.close()
    #dom = xml.dom.minidom.parseString(t.contents)
    #resolvedURL = dom.getElementsByTagName("resolvedURL")[0].firstChild.nodeValue
    return resolvedURL.url

注意：评论中的内容是我在使用 unshort.me 服务时尝试过的，因为它返回的都是带验证码的链接。

有没有人知道更有效的方法来完成这个操作，而不使用 open（因为这浪费带宽）？

数据处理 api 网络请求链接解析 youtube 验证码短链接 url 还原

5 个回答

你确实需要打开链接，否则你根本不知道它会重定向到哪个网址。正如Greg所说：

短链接就像是通往别人数据库的钥匙；你不能不查询数据库就扩展这个链接。

现在来回答你的问题。

有没有人知道更有效的方法来完成这个操作，而不使用打开链接的方式（因为这浪费带宽）？

更有效的方法是保持连接不断开，让它在后台保持打开状态，使用HTTP的 Connection: keep-alive。

经过小测试，unshorten.me似乎会考虑到 HEAD 方法，并且会重定向到自己：

> telnet unshorten.me 80
Trying 64.202.189.170...
Connected to unshorten.me.
Escape character is '^]'.
HEAD http://unshort.me/index.php?r=http%3A%2F%2Fbit.ly%2FcXEInp HTTP/1.1
Host: unshorten.me

HTTP/1.1 301 Moved Permanently
Date: Mon, 22 Aug 2011 20:42:46 GMT
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
X-AspNet-Version: 2.0.50727
Location: http://resolves.me/index.php?r=http%3A%2F%2Fbit.ly%2FcXEInp
Cache-Control: private
Content-Length: 0

所以如果你使用 HEAD HTTP 方法，而不是 GET，你实际上会做两次相同的工作。

相反，你应该保持连接活着，这样虽然只会节省一点带宽，但它肯定会节省每次建立新连接时的延迟。建立一个TCP/IP连接是很耗费资源的。

你应该保持与unshorten服务的保持活动连接数量，等于你自己服务接收的并发连接数量。

你可以在一个连接池中管理这些连接。这是你能做到的最接近的方式。除了调整你内核的TCP/IP堆栈。

回答于 2025-04-17 由 Python大师

分享举报

一行函数，使用requests库，并且支持递归。

def unshorten_url(url):
    return requests.head(url, allow_redirects=True).url

回答于 2025-04-17 由 Python大师

分享举报

在这个问题中，使用评分最高的回答（而不是被接受的回答）：

# This is for Py2k.  For Py3k, use http.client and urllib.parse instead, and
# use // instead of / for the division
import httplib
import urlparse

def unshorten_url(url):
    parsed = urlparse.urlparse(url)
    h = httplib.HTTPConnection(parsed.netloc)
    resource = parsed.path
    if parsed.query != "":
        resource += "?" + parsed.query
    h.request('HEAD', resource )
    response = h.getresponse()
    if response.status/100 == 3 and response.getheader('Location'):
        return unshorten_url(response.getheader('Location')) # changed to process chains of short urls
    else:
        return url

回答于 2025-04-17 由 Python大师

分享举报

如何使用Python还原缩短的URL？

5 个回答

撰写回答