Chrome Devtools协议的玩具。阅读更多:https://github.com/clericpy/ichrome。
ichrome的Python项目详细描述
ichrome - v0.1.2
A toy for using chrome under the Chrome Devtools Protocol(CDP). For python3.6+ (who cares python2.x).
安装
pip install ichrome -U
为什么?
Pyppeteer/Selenium很棒,但我不需要那么多…
一种测试CDP的方法
功能
- chrome进程后台程序
- 选项卡上的操作
示例
chrome后台程序
fromichromeimportChromeDaemon,Chromedefmain():withChromeDaemon()aschromed:# run_forever means auto_restartchromed.run_forever(0)chrome=Chrome()tab=chrome.new_tab(url="https://pypi.org")tab.wait_loading(3)tab.js('alert("test ok")')tab.close()if__name__=="__main__":main()
连接到现有的调试端口< EH3>fromichromeimportChromedefmain():chrome=Chrome(port=9222)print(chrome.tabs)# [ChromeTab("6EC65C9051697342082642D6615ECDC0", "about:blank", "about:blank", port: 9222)]print(chrome.tabs[0])# Tab(about:blank)if__name__=="__main__":main()
高级用法(对特殊后台请求进行爬网)
"""Test normal usage of ichrome.1. use `with` context for launching ChromeDaemon daemon process.2. init Chrome for connecting with chrome background server.3. Tab ops: 3.1 create a new tab 3.2 goto new url with tab.set_url, and will stop load for timeout. 3.3 get cookies from url 3.4 inject the jQuery lib by a static url. 3.5 auto click ok from the alert dialog. 3.6 remove `href` from the third `a` tag, which is selected by css path. 3.7 remove all `href` from the `a` tag, which is selected by css path. 3.8 use querySelectorAll to get the elements. 3.9 Network crawling from the background ajax request. 3.10 click some element by tab.click with css selector. 3.11 show html source code of the tab"""defexample():importsysimportos# use local ichrome modulesys.path.insert(0,os.path.dirname(os.path.dirname(__file__)))os.chdir("..")# for reuse exiting user data dirfromichromeimportChrome,ChromeDaemon,ichrome_loggerasloggerimportreimportjson"""Example for crawling a special background request."""# reset default logger level, such as DEBUG# import logging# logger.setLevel(logging.INFO)# launch the Chrome process and daemon process, will auto shutdown by 'with' expression.withChromeDaemon(host="127.0.0.1",port=9222,max_deaths=1)aschromed:# create connection to Chrome Devtoolschrome=Chrome(host="127.0.0.1",port=9222,timeout=3,retry=1)# now create a new tab without urltab=chrome.new_tab()# reset the url to bing.com, if loading time more than 5 seconds, will stop loading.# if inject js success, will alert Vuetab.set_url("https://www.bing.com/",referrer="https://www.github.com/",timeout=5)# get_cookies from urllogger.info(tab.get_cookies("http://cn.bing.com"))# test inject_js, if success, will alert jQuery version info 3.3.1logger.info(tab.inject_js("https://cdn.staticfile.org/jquery/3.3.1/jquery.min.js"))logger.info(tab.js("alert('jQuery inject success:' + jQuery.fn.jquery)"))tab.js('alert("Check the links above disabled, and then input `test` to the input position.")')# automate press accept for alert~tab.send("Page.handleJavaScriptDialog",accept=True)# remove href of the a tag.tab.click("#sc_hdu>li>a",index=3,action="removeAttribute('href')")# remove href of all the 'a' tag.tab.querySelectorAll("#sc_hdu>li>a",index=None,action="removeAttribute('href')")# use querySelectorAll to get the elements.foriintab.querySelectorAll("#sc_hdu>li"):logger.info("Tag: %s, id:%s, class:%s, text:%s"%(i,i.get("id"),i.get("class"),i.text))# enable the Network function, otherwise will not recv Network request/response.logger.info(tab.send("Network.enable"))# here will block until input string "test" in the input position.# tab is waiting for the event Network.responseReceived which accord with the given filter_function.recv_string=tab.wait_event("Network.responseReceived",filter_function=lambdar:re.search("&\w+=test",ror""),wait_seconds=None,)# now catching the "Network.responseReceived" event string, load the json.recv_string=json.loads(recv_string)# get the requestId to fetch its response body.request_id=recv_string["params"]["requestId"]logger.info("requestId: %s"%request_id)# send request for getResponseBodyresp=tab.send("Network.getResponseBody",requestId=request_id,timeout=5)# now resp is the response body result.logger.info("getResponseBody success %s"%resp)# directly click the button matched the cssselector #sb_form_go, here is the submit button.logger.info(tab.click("#sb_form_go"))# show some html source code of the tablogger.info(tab.html[:100])# now click close button of the chrome browser.chromed.run_forever()if__name__=="__main__":example()
命令行用法
λ python3 -m ichrome -s 9222
2018-11-27 23:01:59 DEBUG [ichrome] base.py(329): kill chrome.exe --remote-debugging-port=9222
2018-11-27 23:02:00 DEBUG [ichrome] base.py(329): kill chrome.exe --remote-debugging-port=9222
λ python3 -m ichrome -p 9222 --start_url "http://bing.com" --disable_image
2018-11-27 23:03:57 INFO [ichrome] __main__.py(69): ChromeDaemon cmd args: {'daemon': True, 'block': True, 'chrome_path': '', 'host': 'localhost', 'port': 9222, 'headless': False, 'user_agent': '', 'proxy': '', 'user_data_dir': None, 'disable_image': True, 'start_url': 'http://bing.com', 'extra_config': '', 'max_deaths': 2, 'timeout': 2}
待办事项
[]并发支持。(gevent,螺纹)
[X]添加崩溃时自动重新启动。
[]使用救生圈自动移除僵尸标签。
[]添加一些有用的示例。
[]协同路由支持(用于异步)。
λ python3 -m ichrome -s 9222
2018-11-27 23:01:59 DEBUG [ichrome] base.py(329): kill chrome.exe --remote-debugging-port=9222
2018-11-27 23:02:00 DEBUG [ichrome] base.py(329): kill chrome.exe --remote-debugging-port=9222
λ python3 -m ichrome -p 9222 --start_url "http://bing.com" --disable_image
2018-11-27 23:03:57 INFO [ichrome] __main__.py(69): ChromeDaemon cmd args: {'daemon': True, 'block': True, 'chrome_path': '', 'host': 'localhost', 'port': 9222, 'headless': False, 'user_agent': '', 'proxy': '', 'user_data_dir': None, 'disable_image': True, 'start_url': 'http://bing.com', 'extra_config': '', 'max_deaths': 2, 'timeout': 2}
[]并发支持。(gevent,螺纹)
[X]添加崩溃时自动重新启动。
[]使用救生圈自动移除僵尸标签。
[]添加一些有用的示例。
[]协同路由支持(用于异步)。