修复Python requests.exception.InvalidURL:无效的百分号转义序列'u2'错误?

2 投票
1 回答
1818 浏览
提问于 2025-04-18 02:56

我的Python应用程序接受经过JavaScript的escape函数处理过的URL,然后在我的Python代码中使用urllib.unquote进行解码。这个方法对大多数URL都能正常工作,但如果文件名(也是URL的一部分)中包含一个&符号,就会出现错误。

错误信息是:requests.exception.InvalidURL : Invalid percent-escape sequence 'u2'

编辑:带有错误的示例代码

import urllib,requests
url = "https%3A//r20---sn-cvh7zn76.googlevideo.com/videoplayback%3Fipbits%3D0%26ms%3Dau%26fexp%3D931328%2C931946%2C934804%2C914004%2C931818%2C937417%2C913434%2C923328%2C936916%2C934022%2C936923%26sparams%3Dclen%2Cdur%2Cgir%2Cid%2Cip%2Cipbits%2Citag%2Clmt%2Crequiressl%2Csource%2Cupn%2Cexpire%26source%3Dyoutube%26mv%3Dm%26dur%3D278.593%26id%3Do-AOxIEhMchATdRjU99Gveow8reeBWtxFaqwpWifXC9KwS%26expire%3D1397662367%26clen%3D4425254%26sver%3D3%26signature%3D9A0CFEC5F59C2C7FC35A8CF87491F4E7F9683C59.C46B3A3602A20611C73CC4228FCB8B287034F52D%26mt%3D1397639060%26upn%3DBknKrHPqCCw%26gir%3Dyes%26itag%3D140%26key%3Dyt5%26ip%3D117.200.252.163%26lmt%3D1386126879207085%26requiressl%3Dyes%26ratebypass%3Dyes%26title%3DHum%20Tuhmaray%20hain%20%u2022%20SRK%20_%20Madhuri%20Dixit%20%u2022%20HD%201080p%20%u2022%20Hindi%20%u2022%20Bollywood%20Songs"
url = urllib.unquote_plus(url).decode('utf-8')
resp = requests.head(url, verify=False, allow_redirects=True)
print resp

字符串Hum Tuhmaray hain • SRK & Madhuri Dixit •会导致问题,问题似乎出在URL中的Unicode项目符号字符%u2022上。

1 个回答

1

在JavaScript代码中使用 encodeURIComponent() 代替 escape() 解决了这个问题。

谢谢!

撰写回答