Selenium 下载文件

3 投票
1 回答
2747 浏览
提问于 2025-04-18 08:23

我正在尝试写一个Selenium程序,目的是自动下载和上传一些文件。

需要说明的是,我这样做不是为了测试,而是想自动化一些任务。

这是我为Firefox浏览器设置的偏好选项

profile.set_preference('browser.download.folderList', 2) # custom location
profile.set_preference('browser.download.manager.showWhenStarting', False)
profile.set_preference('browser.download.dir', '/home/jj/web')
profile.set_preference('browser.helperApps.neverAsk.saveToDisk', 'application/json, text/plain, application/vnd.ms-excel, text/csv, text/comma-separated-values, application/octet-stream')
profile.set_preference("browser.helperApps.alwaysAsk.force", False);

但是,我仍然会看到下载的对话框。

1 个回答

5

Selenium的Firefox浏览器驱动程序会打开Firefox的图形界面。当你开始下载时,Firefox会弹出一个窗口,询问你是想查看文件还是保存文件。根据我的了解,这个行为是浏览器的特性,无法通过Firefox的设置或配置文件来关闭。为了避免Firefox的下载弹窗,我使用了Mechanize和Selenium的组合。我先用Selenium获取下载链接,然后把这个链接传给Mechanize来实际下载。Mechanize没有图形界面,所以不会弹出任何用户界面的窗口。

下面这段代码是用Python写的,属于一个可以执行下载操作的类。

  # These imports are required
  from selenium import webdriver
  import mechanize
  import time


  # Start the firefox browser using Selenium
  self.driver = webdriver.Firefox()

  # Load the download page using its URL.
  self.driver.get(self.dnldPageWithKey)
  time.sleep(3)

  # Find the download link and click it
  elem = self.driver.find_element_by_id("regular")
  dnldlink = elem.get_attribute("href")
  logfile.write("Download Link is: " + dnldlink)
  pos = dnldlink.rfind("/")
  dnldFilename = dnldlink[pos+1:]
  dnldFilename = "/home/<mydir>/Downloads/" + dnldFilename
  logfile.write("Download filename is: " + dnldFilename)

  #### Now Using Mechanize ####
  # Above, Selenium retrieved the download link. Because of Selenium's
  # firefox download issue: it presents a download dialog that requires
  # user input, Mechanize will be used to perform the download.

  # Setup the mechanize browser. The browser does not get displayed.
  # It is managed behind the scenes.
  br = mechanize.Browser()

  # Open the login page, the download requires a login
  resp = br.open(webpage.loginPage)

  # Select the form to use on this page. There is only one, it is the
  # login form.
  br.select_form(nr=0)

  # Fill in the login form fields and submit the form. 
  br.form['login_username'] = theUsername
  br.form['login_password'] = thePassword
  br.submit()

  # The page returned after the submit is a transition page with a link
  # to the welcome page. In a user interactive session the browser would
  # automtically switch us to the welcome page.
  # The first link on the transition page will take us to the welcome page.
  # This step may not be necessary, but it puts us where we should be after
  # logging in.
  br.follow_link(nr=0)

  # Now download the file
  br.retrieve(dnldlink, dnldFilename)

  # After the download, close the Mechanize browser; we are done.
  br.close()

这个方法对我有效,希望对你也有帮助。如果有更简单的解决办法,我很想知道。

撰写回答