ibm streams hdfs集成
streamsx.hdfs的Python项目详细描述
概述
提供访问HDF上文件的功能。例如,连接到IBM Cloud上的IBM分析引擎
此包将com.ibm.streamsx.hdfs工具包公开为python方法,用于上的流分析服务 IBM Cloud和IBM Streams,包括ibmcloud-Pak for Data
样品
streams应用程序向 给HDFS的文件扫描hdfs上创建的文件并读取内容:
from streamsx.topology.topology import * from streamsx.topology.schema import CommonSchema, StreamSchema from streamsx.topology.context import submit import streamsx.hdfs as hdfs credentials = json.load(credentials_analytics_engine_service) topo = Topology('HDFSHelloWorld') to_hdfs = topo.source(['Hello', 'World!']) to_hdfs = to_hdfs.as_string() # Write a stream to HDFS hdfs.write(to_hdfs, credentials=credentials, file='/sample/hw.txt') scanned = hdfs.scan(topo, credentials=credentials, directory='/sample', init_delay=10) # read text file line by line r = hdfs.read(scanned, credentials=credentials) # print each line (tuple) r.print() submit('STREAMING_ANALYTICS_SERVICE', topo) # Use for IBM Streams including IBM Cloud Pak for Data # submit ('DISTRIBUTED', topo)