有 Java 编程相关的问题?

你可以在下面搜索框中键入要查询的问题!

使用Apache Spark解析Amazon S3中的文件

1 年，9 月 Questions & Answers 1312

我使用的是ApacheSpark，我必须解析来自AmazonS3的文件。从Amazon S3路径获取文件时，我如何知道文件扩展名

Tags:

共 (1) 个答案

# 1 楼答案

我建议遵循Cloudera教程Accessing Data Stored in Amazon S3 through Spark

To access data stored in Amazon S3 from Spark applications, you could use Hadoop file APIs (SparkContext.hadoopFile, JavaHadoopRDD.saveAsHadoopFile, SparkContext.newAPIHadoopRDD, and JavaHadoopRDD.saveAsNewAPIHadoopFile) for reading and writing RDDs, providing URLs of the form s3a://bucket_name/path/to/file.txt.

You can read and write Spark SQL DataFrames using the Data Source API.

关于文件扩展名，几乎没有解决方案。您只需按文件名获取扩展名（即file.txt）

如果S3存储桶中存储的文件删除了扩展名，那么查看为每个S3资源添加的元数据时，仍然可以知道内容类型

http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectHEAD.html