与spark历史服务器交互的python库
spark-monitoring的Python项目详细描述
火花监测
与Spark历史服务器交互的Python库。
快速启动
基本
$ pip install spark-monitoring
importsparkmonitoringassparkmonmonitoring=sparkmon.client('my.history.server')print(monitoring.list_applications())
熊猫
$ pip install spark-monitoring[pandas]
importsparkmonitoringassparkmonimportmatplotlib.pyplotaspltmonitoring=sparkmon.df('my.history.server')apps=monitoring.list_applications()apps['function']=apps.name.str.split('(').str.get(0)print(apps.head().stack())plt.figure()apps['duration'].hist(by=apps['function'],figsize=(40,20))plt.show()jobs=monitoring.list_jobs(apps.iloc[0].id)print(jobs.head().stack())
参考
SparkMonitoring.客户端
方法返回一个客户端,用它调用Spark历史服务器。
参数
Name | Type | Description | Default |
---|---|---|---|
^{ | ^{ | Hostname or IP pointing to the spark history server | |
^{ | ^{ | Port which the spark history server is exposed on | ^{ |
^{ | ^{ | Whether or not to use https to communicate with the spark server | ^{ |
^{ | ^{ | API Version to interact with. Currently only ^{ | ^{ |
响应
示例
basic端点
importsparkmonitoringassparkmonclient=sparkmon.client('my.history.server')
自定义端点
importsparkmonitoringassparkmonclient=sparkmon.client('my.history.server',port=8080,is_https=True)
SparkMonitoring.df
方法返回一个客户端,用它调用Spark历史服务器。这个
客户端将返回pandas数据帧,而不是
标准客户。当附加的spark-monitoring[pandas]
是
安装。
参数
Name | Type | Description | Default |
---|---|---|---|
^{ | ^{ | Hostname or IP pointing to the spark history server | |
^{ | ^{ | Port which the spark history server is exposed on | ^{ |
^{ | ^{ | Whether or not to use https to communicate with the spark server | ^{ |
^{ | ^{ | API Version to interact with. Currently only ^{ | ^{ |
响应
示例
basic端点
importsparkmonitoringassparkmonclient=sparkmon.df('my.history.server')
自定义端点
importsparkmonitoringassparkmonclient=sparkmon.df('my.history.server',port=8080,is_https=True)
sparkmonitoring.api.clientv1
与Spark历史服务器交互的客户端。
通常这个类不是直接实例化的,而是通过
^{
参数
Name | Type | Description | Default |
---|---|---|---|
^{ | ^{ | Hostname or IP pointing to the spark history server | |
^{ | ^{ | Port which the spark history server is exposed on | |
^{ | ^{ | Whether or not to use https to communicate with the spark server | |
^{ | ^{ | API Version to interact with. Currently only ^{ |
方法
- ^{
} get_application(...)
list_jobs(...)
get_job(...)
list_stages(...)
list_stage_attempts(...)
get_stage_attempt(...)
get_stage_attempt_summary(...)
get_stage_attempt_tasks(...)
list_active_executors(...)
list_executor_threads(...)
list_all_executors(...)
sparkmonitoring.dataframes.pandasclient.list_应用程序
所有应用程序的列表。
参数
Name | Type | Description | Default |
---|---|---|---|
^{ | ^{ | Type of applications to return | |
^{ | ^{ | Earliest Application | |
^{ | ^{ | Latest Application | |
^{ | ^{ | Number of results to return |
sparkmonitoring.dataframes.pandasclient
与Spark历史服务器交互的客户端,返回熊猫
数据帧。
通常这个类不是直接实例化的,而是通过
^{
参数
Name | Type | Description | Default |
---|---|---|---|
^{ | ^{ | Hostname or IP pointing to the spark history server | |
^{ | ^{ | Port which the spark history server is exposed on | ^{ |
^{ | ^{ | Whether or not to use https to communicate with the spark server | ^{ |
^{ | ^{ | API Version to interact with. Currently only ^{ | ^{ |
方法
list_applications(...)
get_application(...)
list_jobs(...)
get_job(...)
list_stages(...)