在Selenium Python脚本中处理无法打开URL和上传到Google Drive时的BrokenPipeError

-1 投票
1 回答
46 浏览
提问于 2025-04-14 17:47

我正在写一个Python脚本,这个脚本可以遍历一系列网址,给每个网页截图,然后通过Selenium、Google API和GSP把截图上传到Google Drive。这个脚本应该尝试打开每个网址五次;如果五次都打不开,就应该用continue语句跳过当前的网址,继续下一个网址。

但是,我遇到了一个BrokenPipeError错误,每当脚本在尝试打开网址后失败时就会出现这个错误。这样一来,脚本就没有继续执行下去,而是停止了,这并不是我想要的结果。下面是相关的代码部分:

max_attempts = 5

for record in records:
    url = record['Link']
    folder_id = record['Link to folder']
    successful_connection = False  # Flag to track if connection was successful

    for attempt in range(max_attempts):
        try:
            driver.get(url)
            time.sleep(random.uniform(1, 3))
            successful_connection = True  # Set the flag to True if successful
            break  # Exit the loop if successful
        except Exception as e:  # Catch the specific exception if possible
            print(f"Attempt {attempt + 1} of {max_attempts} failed: {str(e)}")
            time.sleep(10)  # Wait for 10 seconds before retrying

    if not successful_connection:
        print(f"Failed to connect to {url} after {max_attempts} attempts.")
        continue  # Skip the rest of the code in this loop iteration and move to the next record
    
    # If connection was successful, proceed with screenshot and upload
    current_date = datetime.now().strftime('%Y-%m-%d')
    page_width = driver.execute_script('return document.body.scrollWidth')
    page_height = driver.execute_script('return document.body.scrollHeight')
    screenshot_path = f"{current_date}-{record['Client']}-{record['Platform']}.png"
    driver.set_window_size(page_width, page_height)
    driver.save_screenshot(screenshot_path)

    # Upload to Google Drive
    file_metadata = {'name': screenshot_path, 'parents': [folder_id]}
    media = MediaFileUpload(screenshot_path, mimetype='image/png')
    file = drive_service.files().create(body=file_metadata, media_body=media, fields='id').execute()
    
    os.remove(screenshot_path)

driver.quit()

还有这个错误信息:

    self._send_request(method, url, body, headers, encode_chunked)
  File "/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/http/client.py", line 1331, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/http/client.py", line 1280, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/http/client.py", line 1040, in _send_output
    self.send(msg)
  File "/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/http/client.py", line 1001, in send
    self.sock.sendall(data)
  File "/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/ssl.py", line 1238, in sendall
    v = self.send(byte_view[count:])
  File "/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/ssl.py", line 1207, in send
    return self._sslobj.write(data)
BrokenPipeError: [Errno 32] Broken pipe
Error: Process completed with exit code 1.

我怀疑这个问题可能和异常处理或者资源管理有关,但我不太确定怎么找到问题的根源或者解决这个BrokenPipeError。任何建议或者对这个问题的看法都非常感谢。

我还尝试创建一个空的PNG文件并上传一个虚拟文件,以防连接失败,但还是遇到了同样的错误。

1 个回答

-2

具体的异常处理: 捕获一个很宽泛的异常可能会抓到的不仅仅是连接相关的问题。更好的做法是捕获更具体的异常,这样可以更合适地处理不同的错误情况。例如,你可能想要捕获超时异常(TimeoutException)来处理超时问题,或者捕获WebDriver异常(WebDriverException)来处理一般的WebDriver问题,具体要看你的使用场景。

python

from selenium.common.exceptions import TimeoutException, WebDriverException

for attempt in range(max_attempts):
    try:
        driver.get(url)
        time.sleep(random.uniform(1, 3))
        successful_connection = True
        break
    except TimeoutException as e:
        print(f"Attempt {attempt + 1} of {max_attempts} failed: Timeout - {str(e)}")
        time.sleep(10)
    except WebDriverException as e:
        print(f"Attempt {attempt + 1} of {max_attempts} failed: WebDriver issue - {str(e)}")
        time.sleep(10)
    # Add more specific exceptions as needed

日志记录: 考虑使用日志模块,而不是简单的打印语句来记录信息。这样可以更好地控制日志的级别、格式,并且可以将日志输出到不同的地方。

python

import logging

logging.basicConfig(level=logging.INFO)

for attempt in range(max_attempts):
    try:
        driver.get(url)
        time.sleep(random.uniform(1, 3))
        successful_connection = True
        break
    except TimeoutException as e:
        logging.error(f"Attempt {attempt + 1} of {max_attempts} failed: Timeout - {str(e)}")
        time.sleep(10)
    except WebDriverException as e:
        logging.error(f"Attempt {attempt + 1} of {max_attempts} failed: WebDriver issue - {str(e)}")
        time.sleep(10)
    # Add more specific exceptions as needed

处理WebDriver的清理: 即使发生异常,也要确保处理WebDriver的清理工作。你可以使用try...finally结构来确保调用driver.quit(),这样可以正确地关闭WebDriver。

python

    try:
        # Your existing code
    finally:
        driver.quit()

这些建议旨在增强你脚本的健壮性和可维护性。根据你的具体使用情况和需求,你可能需要相应地调整异常处理和日志记录的方法。

看看你对此的看法:

python

import time
import random
from datetime import datetime
from selenium.common.exceptions import TimeoutException, WebDriverException
import logging

# Configure logging
logging.basicConfig(level=logging.INFO)

max_attempts = 5

for record in records:
    url = record['Link']
    folder_id = record['Link to folder']
    successful_connection = False  # Flag to track if connection was successful

    for attempt in range(max_attempts):
        try:
            driver.get(url)
            time.sleep(random.uniform(1, 3))
            successful_connection = True  # Set the flag to True if successful
            break  # Exit the loop if successful
        except TimeoutException as e:
            logging.error(f"Attempt {attempt + 1} of {max_attempts} failed: Timeout - {str(e)}")
            time.sleep(10)
        except WebDriverException as e:
            logging.error(f"Attempt {attempt + 1} of {max_attempts} failed: WebDriver issue - {str(e)}")
            time.sleep(10)
        except Exception as e:  # Catch other specific exceptions if needed
            logging.error(f"Attempt {attempt + 1} of {max_attempts} failed: {str(e)}")
            time.sleep(10)

    if not successful_connection:
        logging.error(f"Failed to connect to {url} after {max_attempts} attempts.")
        continue  # Skip the rest of the code in this loop iteration and move to the next record

    # If connection was successful, proceed with screenshot and upload
    current_date = datetime.now().strftime('%Y-%m-%d')
    page_width = driver.execute_script('return document.body.scrollWidth')
    page_height = driver.execute_script('return document.body.scrollHeight')
    screenshot_path = f"{current_date}-{record['Client']}-{record['Platform']}.png"
    driver.set_window_size(page_width, page_height)
    driver.save_screenshot(screenshot_path)

    # Upload to Google Drive
    file_metadata = {'name': screenshot_path, 'parents': [folder_id]}
    media = MediaFileUpload(screenshot_path, mimetype='image/png')
    file = drive_service.files().create(body=file_metadata, media_body=media, fields='id').execute()

    os.remove(screenshot_path)

# Ensure proper cleanup
try:
    driver.quit()
except Exception as e:
    logging.error(f"Failed to quit the WebDriver: {str(e)}")

在这个修改过的脚本中:

像超时异常(TimeoutException)和WebDriver异常(WebDriverException)这样的具体异常被单独捕获,以便更好地处理错误。

使用日志记录代替打印语句,以获得更好的控制和灵活性。

try...finally结构确保即使在执行过程中发生异常,也会调用driver.quit()进行适当的清理。

请确保根据你的具体需求和运行环境进一步调整脚本。

撰写回答