在Selenium Python脚本中处理无法打开URL和上传到Google Drive时的BrokenPipeError
我正在写一个Python脚本,这个脚本可以遍历一系列网址,给每个网页截图,然后通过Selenium、Google API和GSP把截图上传到Google Drive。这个脚本应该尝试打开每个网址五次;如果五次都打不开,就应该用continue
语句跳过当前的网址,继续下一个网址。
但是,我遇到了一个BrokenPipeError
错误,每当脚本在尝试打开网址后失败时就会出现这个错误。这样一来,脚本就没有继续执行下去,而是停止了,这并不是我想要的结果。下面是相关的代码部分:
max_attempts = 5
for record in records:
url = record['Link']
folder_id = record['Link to folder']
successful_connection = False # Flag to track if connection was successful
for attempt in range(max_attempts):
try:
driver.get(url)
time.sleep(random.uniform(1, 3))
successful_connection = True # Set the flag to True if successful
break # Exit the loop if successful
except Exception as e: # Catch the specific exception if possible
print(f"Attempt {attempt + 1} of {max_attempts} failed: {str(e)}")
time.sleep(10) # Wait for 10 seconds before retrying
if not successful_connection:
print(f"Failed to connect to {url} after {max_attempts} attempts.")
continue # Skip the rest of the code in this loop iteration and move to the next record
# If connection was successful, proceed with screenshot and upload
current_date = datetime.now().strftime('%Y-%m-%d')
page_width = driver.execute_script('return document.body.scrollWidth')
page_height = driver.execute_script('return document.body.scrollHeight')
screenshot_path = f"{current_date}-{record['Client']}-{record['Platform']}.png"
driver.set_window_size(page_width, page_height)
driver.save_screenshot(screenshot_path)
# Upload to Google Drive
file_metadata = {'name': screenshot_path, 'parents': [folder_id]}
media = MediaFileUpload(screenshot_path, mimetype='image/png')
file = drive_service.files().create(body=file_metadata, media_body=media, fields='id').execute()
os.remove(screenshot_path)
driver.quit()
还有这个错误信息:
self._send_request(method, url, body, headers, encode_chunked)
File "/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/http/client.py", line 1331, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/http/client.py", line 1280, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/http/client.py", line 1040, in _send_output
self.send(msg)
File "/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/http/client.py", line 1001, in send
self.sock.sendall(data)
File "/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/ssl.py", line 1238, in sendall
v = self.send(byte_view[count:])
File "/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/ssl.py", line 1207, in send
return self._sslobj.write(data)
BrokenPipeError: [Errno 32] Broken pipe
Error: Process completed with exit code 1.
我怀疑这个问题可能和异常处理或者资源管理有关,但我不太确定怎么找到问题的根源或者解决这个BrokenPipeError。任何建议或者对这个问题的看法都非常感谢。
我还尝试创建一个空的PNG文件并上传一个虚拟文件,以防连接失败,但还是遇到了同样的错误。
1 个回答
具体的异常处理: 捕获一个很宽泛的异常可能会抓到的不仅仅是连接相关的问题。更好的做法是捕获更具体的异常,这样可以更合适地处理不同的错误情况。例如,你可能想要捕获超时异常(TimeoutException)来处理超时问题,或者捕获WebDriver异常(WebDriverException)来处理一般的WebDriver问题,具体要看你的使用场景。
python
from selenium.common.exceptions import TimeoutException, WebDriverException
for attempt in range(max_attempts):
try:
driver.get(url)
time.sleep(random.uniform(1, 3))
successful_connection = True
break
except TimeoutException as e:
print(f"Attempt {attempt + 1} of {max_attempts} failed: Timeout - {str(e)}")
time.sleep(10)
except WebDriverException as e:
print(f"Attempt {attempt + 1} of {max_attempts} failed: WebDriver issue - {str(e)}")
time.sleep(10)
# Add more specific exceptions as needed
日志记录: 考虑使用日志模块,而不是简单的打印语句来记录信息。这样可以更好地控制日志的级别、格式,并且可以将日志输出到不同的地方。
python
import logging
logging.basicConfig(level=logging.INFO)
for attempt in range(max_attempts):
try:
driver.get(url)
time.sleep(random.uniform(1, 3))
successful_connection = True
break
except TimeoutException as e:
logging.error(f"Attempt {attempt + 1} of {max_attempts} failed: Timeout - {str(e)}")
time.sleep(10)
except WebDriverException as e:
logging.error(f"Attempt {attempt + 1} of {max_attempts} failed: WebDriver issue - {str(e)}")
time.sleep(10)
# Add more specific exceptions as needed
处理WebDriver的清理: 即使发生异常,也要确保处理WebDriver的清理工作。你可以使用try...finally结构来确保调用driver.quit(),这样可以正确地关闭WebDriver。
python
try:
# Your existing code
finally:
driver.quit()
这些建议旨在增强你脚本的健壮性和可维护性。根据你的具体使用情况和需求,你可能需要相应地调整异常处理和日志记录的方法。
看看你对此的看法:
python
import time
import random
from datetime import datetime
from selenium.common.exceptions import TimeoutException, WebDriverException
import logging
# Configure logging
logging.basicConfig(level=logging.INFO)
max_attempts = 5
for record in records:
url = record['Link']
folder_id = record['Link to folder']
successful_connection = False # Flag to track if connection was successful
for attempt in range(max_attempts):
try:
driver.get(url)
time.sleep(random.uniform(1, 3))
successful_connection = True # Set the flag to True if successful
break # Exit the loop if successful
except TimeoutException as e:
logging.error(f"Attempt {attempt + 1} of {max_attempts} failed: Timeout - {str(e)}")
time.sleep(10)
except WebDriverException as e:
logging.error(f"Attempt {attempt + 1} of {max_attempts} failed: WebDriver issue - {str(e)}")
time.sleep(10)
except Exception as e: # Catch other specific exceptions if needed
logging.error(f"Attempt {attempt + 1} of {max_attempts} failed: {str(e)}")
time.sleep(10)
if not successful_connection:
logging.error(f"Failed to connect to {url} after {max_attempts} attempts.")
continue # Skip the rest of the code in this loop iteration and move to the next record
# If connection was successful, proceed with screenshot and upload
current_date = datetime.now().strftime('%Y-%m-%d')
page_width = driver.execute_script('return document.body.scrollWidth')
page_height = driver.execute_script('return document.body.scrollHeight')
screenshot_path = f"{current_date}-{record['Client']}-{record['Platform']}.png"
driver.set_window_size(page_width, page_height)
driver.save_screenshot(screenshot_path)
# Upload to Google Drive
file_metadata = {'name': screenshot_path, 'parents': [folder_id]}
media = MediaFileUpload(screenshot_path, mimetype='image/png')
file = drive_service.files().create(body=file_metadata, media_body=media, fields='id').execute()
os.remove(screenshot_path)
# Ensure proper cleanup
try:
driver.quit()
except Exception as e:
logging.error(f"Failed to quit the WebDriver: {str(e)}")
在这个修改过的脚本中:
像超时异常(TimeoutException)和WebDriver异常(WebDriverException)这样的具体异常被单独捕获,以便更好地处理错误。
使用日志记录代替打印语句,以获得更好的控制和灵活性。
try...finally结构确保即使在执行过程中发生异常,也会调用driver.quit()进行适当的清理。
请确保根据你的具体需求和运行环境进一步调整脚本。