我用python和selenium编写了一个脚本,从一个网页下载几个文档文件(以.doc结尾)。我不想使用requests
或urllib
模块下载这些文件,是因为我当前使用的网站没有连接到每个文件的真正的url。它们是javascript加密的。不过,我在脚本中选择了一个链接来模拟相同的内容。在
我的脚本此时的作用是:
(this is what I need rectified)
How can I modify my script to download the files initiating click on their links and put the downloaded files in their concerning folders?
这是我目前为止的尝试:
import os
import time
from selenium import webdriver
link ='https://www.online-convert.com/file-format/doc'
dirf = os.path.expanduser('~')
desk_location = dirf + r'\Desktop\file_folder'
if not os.path.exists(desk_location):os.mkdir(desk_location)
def download_files():
driver.get(link)
for item in driver.find_elements_by_css_selector("a[href$='.doc']")[:2]:
filename = item.get_attribute("href").split("/")[-1]
#creating new folder in accordance with filename to store the downloaded file in thier concerning folder
folder_name = item.get_attribute("href").split("/")[-1].split(".")[0]
#set the new location of the folders to be created
new_location = os.path.join(desk_location,folder_name)
if not os.path.exists(new_location):os.mkdir(new_location)
#set the location of the folders the downloaded files will be within
file_location = os.path.join(new_location,filename)
item.click()
time_to_wait = 10
time_counter = 0
try:
while not os.path.exists(file_location):
time.sleep(1)
time_counter += 1
if time_counter > time_to_wait:break
except Exception:pass
if __name__ == '__main__':
chromeOptions = webdriver.ChromeOptions()
prefs = {'download.default_directory' : desk_location,
'profile.default_content_setting_values.automatic_downloads': 1
}
chromeOptions.add_experimental_option('prefs', prefs)
driver = webdriver.Chrome(chrome_options=chromeOptions)
download_files()
下图表示下载文件当前的存储方式(the files are outside of their concerning folders)
:
在声明驱动程序对象时使用这段代码(这是针对Java的,Python也有类似的方法来实现它) 这将每次将文件下载到指定的位置。在
使用python3中的^{} 库或python2的^{} 库来处理路径。它提供了一种面向对象的方法来处理文件和目录。它还有
PurePath
对象,它可以在不接触文件系统的情况下处理路径。在我只是添加了文件的重命名来移动它。因此,它会像您拥有的一样工作,但一旦它下载了文件,就会将其移动到正确的路径:
os.rename(desk_location + '\\' + filename, file_location)
完整代码:
相关问题 更多 >
编程相关推荐