无法使用selenium的类实现刮取网页

2024-05-12 13:31:55 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在使用selenium来抓取一个由javascript动态生成的网页。 当我直接从cmd(python)终端调用时,它工作得很好。但是,当我在课堂上实现此功能时,效果并不好

我的类实现是:

    class web_scraper():
        def __init__(self):
            # start chrome driver 
            self.driver = webdriver.Chrome(executable_path="./config/chromedriver.exe")
        
       # scrape web page from specified url
        def scrape_page(self, url):
            html = None
            try:
                # scrape page
                self.driver.get(url)
                
                # read html 
                html = self.driver.execute_script("return document.documentElement.innerHTML;")
            except Exception as e:
                print('[Error:] Scrapping failed.')
                print(f'[Exception:] {e}')
    
            return html
     if __name__ == '__main__':
         url = "https://wipp.edmundsassoc.com/Wipp/?wippid=1205#taxPage9"
         scraper = web_scraper()
         content = scraper.scrape_page(url)

我在终端使用的代码是:

driver = webdriver.Chrome(executable_path='E:/Projects/Python_Projects/WebScraping/config/chromedriver.exe')
driver.get("https://wipp.edmundsassoc.com/Wipp/?wippid=1205#taxPage30")
content = driver.execute_script("return document.documentElement.innerHTML;")

类实现的输出是:

<head>
    <meta http-equiv="content-type" content="text/html; charset=UTF-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <link type="text/css" rel="stylesheet" href="Wipp.css">
    <title>WIPP</title>
  <link rel="stylesheet" href="https://wipp.edmundsassoc.com/Wipp/wipp/gwt/standard/standard.css"><script src="https://wipp.edmundsassoc.com/Wipp/wipp/0D3421F8F9508D2F958C63CE2A48BAD8.cache.js"></script></head>

  <body>
    <script type="text/javascript" language="javascript" src="wipp/wipp.nocache.js"></script>
    <iframe src="javascript:''" id="__gwt_historyFrame" tabindex="-1" style="position:absolute;width:0;height:0;border:0"></iframe>


</body>

而对于python终端上的命令,输出是良好的

这方面的任何帮助都是值得的。谢谢

I am using Windows OS and Python version is 3.6.


Tags: httpsselfcomurlhtmldriverpagescript