无法在databricks community edition群集中复制dbfs文件。FileNotFoundError:[Errno 2]没有这样的文件或目录:

2024-06-16 09:22:02 发布

您现在位置:Python中文网/ 问答频道 /正文

正在尝试读取databricks community edition群集中的增量日志文件。(databricks-7.2版本)

df=spark.range(100).toDF("id")
df.show()
df.repartition(1).write.mode("append").format("delta").save("/user/delta_test")

with open('/user/delta_test/_delta_log/00000000000000000000.json','r')  as f:
  for l in f:
    print(l)

Getting file not found error:

FileNotFoundError: [Errno 2] No such file or directory: '/user/delta_test/_delta_log/00000000000000000000.json'
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<command-1759925981994211> in <module>
----> 1 with open('/user/delta_test/_delta_log/00000000000000000000.json','r')  as f:
      2   for l in f:
      3     print(l)

FileNotFoundError: [Errno 2] No such file or directory: '/user/delta_test/_delta_log/00000000000000000000.json'

我已经尝试过添加/dbfs/dbfs:/没有任何结果,仍然得到相同的错误

with open('/dbfs/user/delta_test/_delta_log/00000000000000000000.json','r')  as f:
  for l in f:
    print(l)

但是使用dbutils.fs.head我能够读取文件

dbutils.fs.head("/user/delta_test/_delta_log/00000000000000000000.json")

'{"commitInfo":{"timestamp":1598224183331,"userId":"284520831744638","userName":"","operation":"WRITE","operationParameters":{"mode":"Append","partitionBy":"[]"},"notebook":{"","isolationLevel":"WriteSerializable","isBlindAppend":true,"operationMetrics":{"numFiles":"1","numOutputBytes":"1171","numOutputRows":"100"}}}\n{"protocol":{"minReaderVersi...etc

我们如何在带有python open method的数据库中读取/cat adbfs file


Tags: intestlogjsondfforaswith
1条回答
网友
1楼 · 发布于 2024-06-16 09:22:02

默认情况下,此数据位于DBFS上,您的代码需要了解如何访问它。Python对此一无所知——这就是它失败的原因

但是有一个解决方法-DBFS安装在/dbfs的节点上,所以您只需要将它附加到文件名中:使用/dbfs/user/delta_test/_delta_log/00000000000000000000.json而不是/user/delta_test/_delta_log/00000000000000000000.json

更新:在community edition的DBR 7+中,此装载已禁用。解决方法是使用dbutils.fs.cp命令将文件从DBFS复制到本地目录,如/tmp/var/tmp,然后从中读取:

dbutils.fs.cp("/file_on_dbfs", "file:///tmp/local_file")

请注意,如果您没有指定URI模式,那么默认情况下该文件引用的是DBFS,要引用本地文件,您需要使用file://前缀(请参见docs

相关问题 更多 >