火花：两者之间的区别：sqlContext.read.load文件（路径=“”，格式=“”）和sqlContext.read.format格式（）.加载（）

2024-06-16 11:10:02 发布

男 | 程序猿一只，喜欢编程写python代码。

我试图在Spark2.2上读取一个以unicode分隔的txt文件。最初使用的是以前版本的火花读取负载（）：

df= sqlContext.read.load(path='file:\\C:\Users\zr20684\Downloads\\SPEC_CUST_20190212230550.txt',
                         format= "com.databricks.spark.csv",
                         option= {"delimiter", "←"})

使用上述代码，整行值都在一列中。你知道吗

[Row(_c0=u'01\u2190SPEC\u2190ZS\u2190SDN\u2190Insert\u219002/12/2019\u2190\u2190\u2190\u2190HCP\u2190CUST9635663\u2190\u2190\u2190JAMES\u2190\u2190DEANGELO\u2190\u2190\u2190\u2190\u2190\u2190\u2190\u2190JAMES DEANGELO\u2190\u2190\u2190\u2190\u2190\u2190A')]

使用以下更新代码时：

df= sqlContext.read.format("com.databricks.spark.csv").option("delimiter", "←").option("encoding", "UTF-8").load('file:\\C:\Users\zr20684\Downloads\\SPEC_CUST_20190212230550.txt')

它完美地产生了结果。你知道吗

[Row(_c0=u'01', _c1=u'SPEC', _c2=u'ZS', _c3=u'SDN', _c4=u'Insert', _c5=u'02/12/2019', _c6=None, _c7=None, _c8=None, _c9=u'HCP', _c10=u'CUST9635663', _c11=None, _c12=None, _c13=u'JAMES', _c14=None, _c15=u'DEANGELO', _c16=None, _c17=None, _c18=None, _c19=None, _c20=None, _c21=None, _c22=None, _c23=u'JAMES DEANGELO', _c24=None, _c25=None, _c26=None, _c27=None, _c28=None, _c29=u'A')]

如果我更新了我所有的去擦洗代码，会有任何情况下，我的整个代码将打破？我假设更新的是以前版本的超集。你知道吗

Tags：代码版本 txt none df read load users

1条回答

网友

1楼 · 发布于 2024-06-16 11:10:02

load方法doesn't have ^{}参数，因此它被忽略。你知道吗

相反options应该作为单独的关键字参数提供，即

sqlContext.read.load(path='...', format = "csv", delimiter = "←")

火花：两者之间的区别：sqlContext.read.load文件（路径=“”，格式=“”）和sqlContext.read.format格式（）.加载（）

相关问题更多 >

编程相关推荐

热门问题

热门文章

火花：两者之间的区别：sqlContext.read.load文件（路径=“”，格式=“”）和sqlContext.read.format格式（）.加载（）

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >