Python:从JavaScript按钮获取下载链接
我正在尝试让我的脚本从 www.subscene.com 下载字幕。问题是网页上的下载按钮是用 Java 制作的,出于某种原因,即使我提取了网址,我也无法下载字幕。
我认为这是下载按钮的代码:
<a id="s_lc_bcr_downloadLink" class="downloadLink rating0" href="javascript:WebForm_DoPostBackWithOptions(new WebForm_PostBackOptions("s$lc$bcr$downloadLink", "", true, "", "/english/How-I-Met-Your-Mother-Seventh-Season/subtitle-482407-dlpath-90698/zip.zipx", false, true))">Download English Subtitle</a><a id="s_lc_bcr_previewLink" href="javascript:togglePreview(482407, 'zip');">(See preview)</a>
所以我提取了网址,并告诉我的脚本去下载它:
urllib.urlretrieve('http://subscene.com/english/How-I-Met-Your-Mother-Seventh-Season/subtitle-482407-dlpath-90698/zip.zipx','c:\\sub.zip')
(我加上了 'http://subscene.com')
但出于某种原因,它没有下载到正确的文件。我该怎么办呢?
编辑:
非常感谢!不幸的是,我还是无法让它工作 :( 它显示了以下内容:
from selenium import webdriver
browser = webdriver.Firefox()
browser.execute_script('WebForm_DoPostBackWithOptions(newWebForm_PostBackOptions("s$lc$bcr$downloadLink", "", true, "", "/english/How-I-Met-Your-Mother-Seventh-Season/subtitle-482407-dlpath-90698/zip.zipx", false, true))')
Traceback (most recent call last):
File "<pyshell#2>", line 1, in <module>
browser.execute_script('WebForm_DoPostBackWithOptions(new WebForm_PostBackOptions("s$lc$bcr$downloadLink", "", true, "", "/english/How-I-Met-Your-Mother-Seventh-Season/subtitle-482407-dlpath-90698/zip.zipx", false, true))')
File "C:\Users\User\AppData\Roaming\Python\Python27\site-packages\selenium\webdriver\remote\webdriver.py", line 385, in execute_script{'script': script, 'args':converted_args})['value']
File "C:\Users\User\AppData\Roaming\Python\Python27\site-packages\selenium\webdriver\remote\webdriver.py", line 153, in execute
self.error_handler.check_response(response)
File "C:\Users\User\AppData\Roaming\Python\Python27\site-packages\selenium\webdriver\remote\errorhandler.py", line 126, in check_response
raise exception_class(message, screen, stacktrace)
WebDriverException: Message: ''
1 个回答
4
正如约翰所说,这不是一个文件,而是一些JavaScript代码。所以,不用通过urllib.urlretrieve来获取那个文件,你可以执行这些JavaScript代码,它们会下载文件。这个过程可以通过selenium模块来实现。
from selenium import webdriver
browser = webdriver.Firefox()
browser.get('http://subscene.com/english/How-I-Met-Your-Mother-Seventh-Season/subtitle-482407.aspx')
browser.execute_script('WebForm_DoPostBackWithOptions(new WebForm_PostBackOptions("s$lc$bcr$downloadLink", "", true, "", "/english/How-I-Met-Your-Mother-Seventh-Season/subtitle-482407-dlpath-90698/zip.zipx", false, true))')
raw_input()
我通过Firebug得到了这段JavaScript代码。