Python selenium PhantomJS代理

2024-05-28 22:56:57 发布

您现在位置:Python中文网/ 问答频道 /正文

这是我的代码:

from selenium import webdriver

proxylist=['58.12.12.12:80','69.12.12.12:80']
weblist=['https://www.google.com','https://www.facebook.com','https://www.yahoo.com','https://aol.com']
for s in range (len(proxylist)):
    service_args = ['--proxy=%s'%(proxylist[s]),'--proxy-type=socks5']
    driver = webdriver.PhantomJS('phantomjs.exe', service_args = service_args)
    for s in weblist:
        driver.get(s)

其想法是浏览器首先将使用proxylist[0]访问这些站点。如果proxylist[0]在网站[2]超时,则proxylist[1]将继续对网站[3]执行该工作。我想我应该用try和except,但是a不知道放在哪里。很高兴你帮了忙!在


Tags: 代码infromhttpscomfor网站www
2条回答

超时的尝试捕捉是这样的:

try:
    driver.set_page_load_timeout(1)
    driver.get("http://www.example.com")
except TimeoutException as ex:
    print("Exception has been thrown. " + str(ex))

对于您的代码,添加它将类似于:

^{pr2}$

不过要小心,如果它无法获得一个url,它将更改代理,并获得下一个url(如您所请求的),但不会获得相同的url。在

如果您希望它在使用当前url重试失败时更改代理,请使用以下命令:

from selenium import webdriver
from selenium.common.exceptions import TimeoutException

proxylist=['58.12.12.12:80','69.12.12.12:80']
weblist=['https://www.google.com','https://www.facebook.com','https://www.yahoo.com','https://aol.com']


def test():
    temp_count_proxy = 0
    driver_opened = 0
    for url in weblist:
        while True:

            if temp_count_proxy > len(proxylist):
                print("Out of proxy")
                return

            if driver_opened == 0:
                service_args = [' proxy={}'.format(proxylist[temp_count_proxy]),' proxy-type=socks5']
                driver = webdriver.PhantomJS('phantomjs.exe', service_args = service_args)
                driver_opened = 1

            try:
                driver.set_page_load_timeout(2)
                driver.get(url)
                # Your code to process here

            except TimeoutException as ex:
                driver.close()
                driver_opened = 0
                temp_count_proxy += 1
                continue

            break

试试这个。基本上,我们正在切换内部和外部循环并添加try/except

for s in weblist:
    for s in range (len(proxylist)):
        try

            service_args = [' proxy=%s'%(proxylist[s]),' proxy-type=socks5']
            driver = webdriver.PhantomJS('phantomjs.exe', service_args = service_args)
            driver.get(s)
            break
        except TimeOutException:
            print 'timed out'

相关问题 更多 >

    热门问题