Python具有特定文件大小范围的文件数

import os suffixes = ['B', 'KB', 'MB', 'GB', 'TB', 'PB'] route = raw_input('Enter a location') def human_Readable(nbytes): if nbytes == 0: return '0 B' i = 0 while nbytes >= 1024 and i < len(suffixes)-1: nbytes /= 1024. i += 1 f = ('%.2f' % nbytes).rstrip('0').rstrip('.') return '%s %s' % (f, suffixes[i]) def file_Dist(path, start,end): counter = 0 counter2 = 0 for path, subdir, files in os.walk(path): for r in files: if os.path.getsize(os.path.join(path,r)) > start and os.path.getsize(os.path.join(path,r)) < end: counter += 1 #print "Number of files less than %s:" %(human_Readable(end)), counter print "Number of files greater than %s less than %s:" %(human_Readable(start), human_Readable(end)), counter file_Dist(route, 0, 1024) file_Dist(route,1024,4095) file_Dist(route, 4096, 16383) file_Dist(route, 16384, 65535) file_Dist(route, 65536, 262143) file_Dist(route, 262144, 1048576) file_Dist(route, 1048577, 4194304) file_Dist(route, 4194305, 16777216)

1条回答

网友

1楼 · 发布于 2024-04-29 16:19:22

这里有一些改进的建议。在

通常，将信息作为命令行参数提供而不是提示它更有用。在
对于多个大小的组，在一次遍历目录树时计算所有文件的数量比重复遍历目录树更有效。在
由于大小限制形成了一个规则的序列，因此可以计算它们，而不必单独记下。在
您的程序不计算大小等于组限制的文件；虽然它通过说大于和小于正确地说明了这一点，但我发现不忽略这些文件更有用。在
os.path.getsize()对于断开的符号链接失败；我将使用os.lstat().st_size，这将生成正确的链接文件树大小。在

这是一个执行上述建议的程序版本。注意，它仍然忽略大小为16mib以上的文件-这也可以改进。在

#!/usr/bin/env python
import math
import os
import sys
route = sys.argv[1]

suffixes = ['B', 'KB', 'MB', 'GB', 'TB', 'PB']
def human_Readable(nbytes):
        if nbytes == 0: return '0 B'
        i = 0
        while nbytes >= 1024 and i < len(suffixes)-1:
                nbytes /= 1024.
                i += 1
        f = ('%.2f' % nbytes).rstrip('0').rstrip('.')
        return '%s %s' % (f, suffixes[i])

counter = [0]*8             # count files with size up to 4**(8-1) KB
for path, subdir, files in os.walk(route):
    for r in files:
        size = os.lstat(os.path.join(path, r)).st_size
        group = (math.frexp(size/1024)[1]+1)/2
        if group < len(counter):
            counter[group] += 1
start = 0
for g in range(len(counter)):
    end = 1024*4**g
    print "Number of files at least %s less than %s:" \
          %(human_Readable(start), human_Readable(end)), counter[g]
    start = end

我认为行group = (math.frexp(size/1024)[1]+1)/2，它产生与size对应的counter list元素的索引，需要一些解释。考虑

^{pr2}$

我们得到了这样一个画面：通过选取大小的浮点表示的基2指数并对其进行一点调整（+1因为尾数在[0.5, 1[而不是{}，以及{}从基数2转换为基数4）我们可以计算出适当的计数器列表索引。在

相关问题更多 >

编程相关推荐

热门问题

热门文章