通过巨大的fi进行解析的有效方法

with open(filepath, "r") as open_file: while True: line = open_file.readline() if line == "": # Checks for the end of the file break size = line.split("\t")[0] path = line.strip().split("\t")[1] is_dir = os.path.isdir(path) streamed_file.write(unicode("{isdir},{size},{path}\n".format(isdir=is_dir, size=size, path=path))

3条回答

网友

1楼 · 编辑于 2024-04-19 02:36:29

在通过网络复制之前压缩文件可以加快数据处理速度，因为这样可以更快地将数据写入脚本。你知道吗

你能在远程目标系统上压缩输入文本文件吗？如果是，可以使用python支持的算法（模块zlib、gzip、bz2、lzma、zipfile）将其压缩为一种格式

如果没有，您至少可以在远程存储系统上运行一个脚本来压缩文件。接下来，您将读取文件并使用python模块之一在内存中解压它，然后处理每一行。你知道吗

网友

2楼 · 编辑于 2024-04-19 02:36:29

你可能需要mmap。简介和教程here。你知道吗

作为一种简化，这意味着您可以将磁盘上的文件当作RAM中的文件来处理，而不必实际地将整个文件读入RAM。你知道吗

网友

3楼 · 编辑于 2024-04-19 02:36:29

最大的收益可能来自每行只调用split一次

size, path = line.strip().split("\t")
# or ...split("\t", 3)[0:2] if there are extra fields to ignore

至少可以通过将输入文件视为迭代器并使用csv模块来简化代码。这可能也会加快速度，因为它不需要显式调用split：

with open(filepath, "r") as open_file:
    reader = csv.reader(open_file, delimiter="\t")
    writer = csv.writer(streamed_file)
    for size, path in reader:
       is_dir = os.path.isdir(path)
       writer.writerow([is_dir, size, path])

相关问题更多 >

编程相关推荐

热门问题

热门文章