我有一个spark流python应用程序。 Spark版本:2.0 Python版本:2.6(包括HDP2.5.3.0) 纱线版本:2.7
当我运行spark流时,pyspark会生成大量appcache文件和更大的文件。 请帮我解决我的问题
[tool_vhkt@server-10-60-97-144 tool_vhkt]$ cd appcache/
[tool_vhkt@server-10-60-97-144 appcache]$ ls
application_1489545964820_0084 application_1489990352017_0010
application_1489973039223_0001 application_1489990352017_0020
application_1489973039223_0005 application_1489990352017_0021
application_1489973039223_0006 application_1489990352017_0025
[tool_vhkt@server-10-60-97-144 appcache]$ cd application_1489990352017_0025
[tool_vhkt@server-10-60-97-144 application_1489990352017_0025]$ ls
blockmgr-aeca37a6-4042-45de-8b83-e258fe6e033d
container_e32_1489990352017_0025_01_000003
filecache
spark-73a7b85b-2550-4e52-a9e6-19b56776fa1a
[tool_vhkt@server-10-60-97-144 application_1489990352017_0025]$ du -sh *
212K blockmgr-aeca37a6-4042-45de-8b83-e258fe6e033d
108K container_e32_1489990352017_0025_01_000003
4.0K filecache
674M spark-73a7b85b-2550-4e52-a9e6-19b56776fa1a
[tool_vhkt@server-10-60-97-144 application_1489990352017_0025]$ pwd
/u01/tool_vhkt/hdp/yarn/nodemanager/local/usercache/tool_vhkt/appcache/application_1489990352017_0025/spark-73a7b85b-2550-4e52-a9e6-19b56776fa1a
在spark-73a7b85b-2550-4e52-a9e6-19b56776fa1a文件夹中,有许多带有广播前缀的文件,但我没有使用任何广播变量,我想知道pyspark会自动广播python全局变量
[tool_vhkt@server-10-60-97-144 spark-73a7b85b-2550-4e52-a9e6-19b56776fa1a]$ ls
broadcast1058180300009743990 broadcast4622079796728137437
broadcast1110616477794136309 broadcast4702370114294913778
broadcast1202671379043757392 broadcast4807391796004598278
broadcast1276803072744575618 broadcast4850581263753605028
broadcast1308132491109188538 broadcast4851096518475947533
broadcast1391964928309668173 broadcast4878614870671882987
broadcast1393406243673927281 broadcast4894992928978580640
broadcast1436162117465199741 broadcast4952360795904798486
broadcast1504013522196126114 broadcast4953634337166513374
尝试降低日志级别:
相关问题 更多 >
编程相关推荐