获取文件上次修改日期和文件名pyspark的脚本

2024-05-08 12:12:58 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个挂载点位置,它指向一个blob存储,其中有多个文件。我们需要找到文件的最后修改日期以及文件名。我正在使用下面的脚本 文件列表如下:

/mnt/schema_id=na/184000-9.jsonl
/mnt/schema_id=na/185000-0.jsonl
/mnt/schema_id=na/185000-22.jsonl
/mnt/schema_id=na/185000-25.jsonl
import os
import time
# Path to the file/directory
path = "/mnt/schema_id=na"
         
ti_c = os.path.getctime(path)
ti_m = os.path.getmtime(path)
        
c_ti = time.ctime(ti_c)
m_ti = time.ctime(ti_m)
          
print(f"The file located at the path {path} was created at {c_ti} and was last modified at {m_ti}")

Tags: 文件thepathimportidtimeosschema
2条回答

如果您使用操作系统级别的命令来获取文件信息,那么您就无法访问确切的位置—在DataRicks上,它位于DataRicks文件系统(DBFS)上

要在Python级别上实现这一点,需要将/dbfs前置到路径,因此它将是:

...
path = "/dbfs/mnt/schema_id=na"
for file_item in os.listdir(path):
    file_path = os.path.join(path, file_item)[:5]
    ti_c = os.path.getctime(file_path)
    ...

注意[:5]——它用于从路径中去掉/dbfs前缀,使其与DBFS兼容

这里有一种方法可以实现:

import os
import time
# Path to the file/directory
path = "/dbfs/mnt/schema_id=na"

for file_item in os.listdir(path):
    file_path = os.path.join(path, file_item)
    ti_c = os.path.getctime(file_path)
    ti_m = os.path.getmtime(file_path)
        
    c_ti = time.ctime(ti_c)
    m_ti = time.ctime(ti_m)
          
    print(f"The file {file_item} located at the path {path} was created at {c_ti} and was last modified at {m_ti}")

相关问题 更多 >