Chrome Devtools协议的玩具。阅读更多:https://github.com/clericpy/ichrome。

ichrome的Python项目详细描述


ichrome - v0.1.2

A toy for using chrome under the Chrome Devtools Protocol(CDP). For python3.6+ (who cares python2.x).

安装

pip install ichrome -U

为什么?

  • Pyppeteer/Selenium很棒,但我不需要那么多…

  • 一种测试CDP的方法

功能

  • chrome进程后台程序
  • 连接到现有的Chrome调试端口
  • 选项卡上的操作

示例

chrome后台程序

fromichromeimportChromeDaemon,Chromedefmain():withChromeDaemon()aschromed:# run_forever means auto_restartchromed.run_forever(0)chrome=Chrome()tab=chrome.new_tab(url="https://pypi.org")tab.wait_loading(3)tab.js('alert("test ok")')tab.close()if__name__=="__main__":main()

连接到现有的调试端口< EH3>
fromichromeimportChromedefmain():chrome=Chrome(port=9222)print(chrome.tabs)# [ChromeTab("6EC65C9051697342082642D6615ECDC0", "about:blank", "about:blank", port: 9222)]print(chrome.tabs[0])# Tab(about:blank)if__name__=="__main__":main()

高级用法(对特殊后台请求进行爬网)

"""Test normal usage of ichrome.1. use `with` context for launching ChromeDaemon daemon process.2. init Chrome for connecting with chrome background server.3. Tab ops:  3.1 create a new tab  3.2 goto new url with tab.set_url, and will stop load for timeout.  3.3 get cookies from url  3.4 inject the jQuery lib by a static url.  3.5 auto click ok from the alert dialog.  3.6 remove `href` from the third `a` tag, which is selected by css path.  3.7 remove all `href` from the `a` tag, which is selected by css path.  3.8 use querySelectorAll to get the elements.  3.9 Network crawling from the background ajax request.  3.10 click some element by tab.click with css selector.  3.11 show html source code of the tab"""defexample():importsysimportos# use local ichrome modulesys.path.insert(0,os.path.dirname(os.path.dirname(__file__)))os.chdir("..")# for reuse exiting user data dirfromichromeimportChrome,ChromeDaemon,ichrome_loggerasloggerimportreimportjson"""Example for crawling a special background request."""# reset default logger level, such as DEBUG# import logging# logger.setLevel(logging.INFO)# launch the Chrome process and daemon process, will auto shutdown by 'with' expression.withChromeDaemon(host="127.0.0.1",port=9222,max_deaths=1)aschromed:# create connection to Chrome Devtoolschrome=Chrome(host="127.0.0.1",port=9222,timeout=3,retry=1)# now create a new tab without urltab=chrome.new_tab()# reset the url to bing.com, if loading time more than 5 seconds, will stop loading.# if inject js success, will alert Vuetab.set_url("https://www.bing.com/",referrer="https://www.github.com/",timeout=5)# get_cookies from urllogger.info(tab.get_cookies("http://cn.bing.com"))# test inject_js, if success, will alert jQuery version info 3.3.1logger.info(tab.inject_js("https://cdn.staticfile.org/jquery/3.3.1/jquery.min.js"))logger.info(tab.js("alert('jQuery inject success:' + jQuery.fn.jquery)"))tab.js('alert("Check the links above disabled, and then input `test` to the input position.")')# automate press accept for alert~tab.send("Page.handleJavaScriptDialog",accept=True)# remove href of the a tag.tab.click("#sc_hdu>li>a",index=3,action="removeAttribute('href')")# remove href of all the 'a' tag.tab.querySelectorAll("#sc_hdu>li>a",index=None,action="removeAttribute('href')")# use querySelectorAll to get the elements.foriintab.querySelectorAll("#sc_hdu>li"):logger.info("Tag: %s, id:%s, class:%s, text:%s"%(i,i.get("id"),i.get("class"),i.text))# enable the Network function, otherwise will not recv Network request/response.logger.info(tab.send("Network.enable"))# here will block until input string "test" in the input position.# tab is waiting for the event Network.responseReceived which accord with the given filter_function.recv_string=tab.wait_event("Network.responseReceived",filter_function=lambdar:re.search("&\w+=test",ror""),wait_seconds=None,)# now catching the "Network.responseReceived" event string, load the json.recv_string=json.loads(recv_string)# get the requestId to fetch its response body.request_id=recv_string["params"]["requestId"]logger.info("requestId: %s"%request_id)# send request for getResponseBodyresp=tab.send("Network.getResponseBody",requestId=request_id,timeout=5)# now resp is the response body result.logger.info("getResponseBody success %s"%resp)# directly click the button matched the cssselector #sb_form_go, here is the submit button.logger.info(tab.click("#sb_form_go"))# show some html source code of the tablogger.info(tab.html[:100])# now click close button of the chrome browser.chromed.run_forever()if__name__=="__main__":example()

命令行用法

λ python3 -m ichrome -s 9222
2018-11-27 23:01:59 DEBUG [ichrome] base.py(329): kill chrome.exe --remote-debugging-port=9222
2018-11-27 23:02:00 DEBUG [ichrome] base.py(329): kill chrome.exe --remote-debugging-port=9222

λ python3 -m ichrome -p 9222 --start_url "http://bing.com" --disable_image
2018-11-27 23:03:57 INFO  [ichrome] __main__.py(69): ChromeDaemon cmd args: {'daemon': True, 'block': True, 'chrome_path': '', 'host': 'localhost', 'port': 9222, 'headless': False, 'user_agent': '', 'proxy': '', 'user_data_dir': None, 'disable_image': True, 'start_url': 'http://bing.com', 'extra_config': '', 'max_deaths': 2, 'timeout': 2}

待办事项

  • []并发支持。(gevent,螺纹)

  • [X]添加崩溃时自动重新启动。

  • []使用救生圈自动移除僵尸标签。

  • []添加一些有用的示例。

  • []协同路由支持(用于异步)。

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
java Spring AOP:在方法之间交换信息   数组Java将字符串转换为字符[]   堆内存java。lang.OutOfMemoryError:PermGen space+java   安卓 Java。lang.ClassCastException:无法将活动强制转换到接口   java尝试使用气泡排序将随机整数数组从最大到最小排序   线程“main”java中的indexoutofboundsexception异常。lang.ArrayIndexOutofBounds异常:发电机处为3。main(Generator.java:35)   java“宽大”有什么用?   java SimpleCaptcha NoSuchMethodError   java在哪里部署Web服务的jar依赖项?   Java8获取列表中连续数字的函数方法   java为什么JWT令牌不安全?   java Uber API:在请求或发出令牌时指定多个作用域会返回无效的请求参数   java如何使用映射器从包含多个引用单元的JSON字符串中获取对象列表?   java警告匿名子类(?)没有串行版本   Jackson 2.9.0中的java JsonGenerationException。pr1   java试图打印多个catch语句   java如何创建一个sql表并获得每个唯一字段的平均价格?   java为什么SetMinimumSize设置最小高度而不是宽度?   java与使用POI合并的混淆   java在Xpath中使用“AND”和“normalizespace”时在不同浏览器中遇到不同的错误