使用Python计算目录大小？

在我重新发明这个特定的轮子之前，有没有人用Python计算一个目录的大小呢？如果例程将Mb / Gb等格式化为好将是非常好的。

这抓住了子目录：

 import os def get_size(start_path = '.'): total_size = 0 for dirpath, dirnames, filenames in os.walk(start_path): for f in filenames: fp = os.path.join(dirpath, f) total_size += os.path.getsize(fp) return total_size print get_size()

和使用os.listdir （ 不包括子目录 ）的乐趣oneliner：

 import os sum(os.path.getsize(f) for f in os.listdir('.') if os.path.isfile(f))

参考：

os.path.getsize – 以字节为单位给出大小

os.walk

更新要使用os.path.getsize ，这比使用os.stat（）。st_size方法更清晰。

感谢ghostdog74指出这一点！

os.stat – st_size以字节为单位给出大小。也可以用来获取文件大小和其他文件相关的信息。

2015年更新

scandir可用，可能比os.walk方法更快。一个包可从pypi获得， os.scandir()将包含在python 3.5中：

https://pypi.python.org/pypi/scandir

目前build议的一些方法实现recursion，其他的使用shell或不会产生整齐的格式化结果。当您的代码是Linux平台的一次性代码时，您可以照常进行格式化，包括recursion，作为单行程序。除了最后一行的print ，它将适用于当前版本的python2和python3 ：

 du.py ----- #!/usr/bin/python3 import subprocess def du(path): """disk usage in human readable format (eg '2,1GB')""" return subprocess.check_output(['du','-sh', path]).split()[0].decode('utf-8') if __name__ == "__main__": print(du('.'))

简单，高效，适用于文件和多级目录：

 $ chmod 750 du.py $ ./du.py 2,9M

5年后有点迟，但是因为这仍然在search引擎的search结果中，所以可能会有所帮助。

这是一个recursion函数（它recursion地总结所有子文件夹及其各自文件的大小），其返回与运行“du -sb”时完全相同的字节。在linux中（“。”表示“当前文件夹”）：

 import os def getFolderSize(folder): total_size = os.path.getsize(folder) for item in os.listdir(folder): itempath = os.path.join(folder, item) if os.path.isfile(itempath): total_size += os.path.getsize(itempath) elif os.path.isdir(itempath): total_size += getFolderSize(itempath) return total_size print "Size: " + str(getFolderSize("."))

Python 3.5使用os.scandirrecursion的文件夹大小

 def folder_size(path='.'): total = 0 for entry in os.scandir(path): if entry.is_file(): total += entry.stat().st_size elif entry.is_dir(): total += folder_size(entry.path) return total

monknut的答案是好的，但它失败的符号链接，所以你也必须检查，如果这个path确实存在

 if os.path.exists(fp): total_size += os.stat(fp).st_size

接受的答案不考虑硬链接或软链接，并将这些文件计数两次。你想跟踪你已经看到的inode，而不是添加这些文件的大小。

 import os def get_size(start_path='.'): total_size = 0 seen = {} for dirpath, dirnames, filenames in os.walk(start_path): for f in filenames: fp = os.path.join(dirpath, f) try: stat = os.stat(fp) except OSError: continue try: seen[stat.st_ino] except KeyError: seen[stat.st_ino] = True else: continue total_size += stat.st_size return total_size print get_size()

克里斯的答案是好的，但可以通过使用一组来检查看到的目录，这也避免使用控制stream程的exception更加习惯：

 def directory_size(path): total_size = 0 seen = set() for dirpath, dirnames, filenames in os.walk(path): for f in filenames: fp = os.path.join(dirpath, f) try: stat = os.stat(fp) except OSError: continue if stat.st_ino in seen: continue seen.add(stat.st_ino) total_size += stat.st_size return total_size # size in bytes

一个recursion的一行：

 def getFolderSize(p): from functools import partial prepend = partial(os.path.join, p) return sum([(os.path.getsize(f) if os.path.isfile(f) else getFolderSize(f)) for f in map(prepend, os.listdir(p))])

你可以做这样的事情：

 import commands size = commands.getoutput('du -sh /path/').split()[0]

在这种情况下，我还没有testing结果之前返回它，如果你想你可以检查它与commands.getstatusoutput。

问题的第二部分

 def human(size): B = "B" KB = "KB" MB = "MB" GB = "GB" TB = "TB" UNITS = [B, KB, MB, GB, TB] HUMANFMT = "%f %s" HUMANRADIX = 1024. for u in UNITS[:-1]: if size < HUMANRADIX : return HUMANFMT % (size, u) size /= HUMANRADIX return HUMANFMT % (size, UNITS[-1])

单线你说…这是一个单线：

 sum([sum(map(lambda fname: os.path.getsize(os.path.join(directory, fname)), files)) for directory, folders, files in os.walk(path)])

虽然我可能会分裂出来，它不执行检查。

要转换为kb看到可重用的库来获取人类可读的文件大小的版本？并在其中工作

以下脚本打印指定目录的所有子目录的目录大小。它也尝试从cachingrecursion函数的调用中获益（如果可能的话）。如果省略参数，则脚本将在当前目录中工作。输出按照从最大到最小的目录大小sorting。所以你可以根据你的需要来调整它。

PS我已经使用食谱578019以人性化的格式显示目录大小（ http://code.activestate.com/recipes/578019/ ）

 from __future__ import print_function import os import sys import operator def null_decorator(ob): return ob if sys.version_info >= (3,2,0): import functools my_cache_decorator = functools.lru_cache(maxsize=4096) else: my_cache_decorator = null_decorator start_dir = os.path.normpath(os.path.abspath(sys.argv[1])) if len(sys.argv) > 1 else '.' @my_cache_decorator def get_dir_size(start_path = '.'): total_size = 0 if 'scandir' in dir(os): # using fast 'os.scandir' method (new in version 3.5) for entry in os.scandir(start_path): if entry.is_dir(follow_symlinks = False): total_size += get_dir_size(entry.path) elif entry.is_file(follow_symlinks = False): total_size += entry.stat().st_size else: # using slow, but compatible 'os.listdir' method for entry in os.listdir(start_path): full_path = os.path.abspath(os.path.join(start_path, entry)) if os.path.isdir(full_path): total_size += get_dir_size(full_path) elif os.path.isfile(full_path): total_size += os.path.getsize(full_path) return total_size def get_dir_size_walk(start_path = '.'): total_size = 0 for dirpath, dirnames, filenames in os.walk(start_path): for f in filenames: fp = os.path.join(dirpath, f) total_size += os.path.getsize(fp) return total_size def bytes2human(n, format='%(value).0f%(symbol)s', symbols='customary'): """ (c) http://code.activestate.com/recipes/578019/ Convert n bytes into a human readable string based on format. symbols can be either "customary", "customary_ext", "iec" or "iec_ext", see: http://goo.gl/kTQMs >>> bytes2human(0) '0.0 B' >>> bytes2human(0.9) '0.0 B' >>> bytes2human(1) '1.0 B' >>> bytes2human(1.9) '1.0 B' >>> bytes2human(1024) '1.0 K' >>> bytes2human(1048576) '1.0 M' >>> bytes2human(1099511627776127398123789121) '909.5 Y' >>> bytes2human(9856, symbols="customary") '9.6 K' >>> bytes2human(9856, symbols="customary_ext") '9.6 kilo' >>> bytes2human(9856, symbols="iec") '9.6 Ki' >>> bytes2human(9856, symbols="iec_ext") '9.6 kibi' >>> bytes2human(10000, "%(value).1f %(symbol)s/sec") '9.8 K/sec' >>> # precision can be adjusted by playing with %f operator >>> bytes2human(10000, format="%(value).5f %(symbol)s") '9.76562 K' """ SYMBOLS = { 'customary' : ('B', 'K', 'M', 'G', 'T', 'P', 'E', 'Z', 'Y'), 'customary_ext' : ('byte', 'kilo', 'mega', 'giga', 'tera', 'peta', 'exa', 'zetta', 'iotta'), 'iec' : ('Bi', 'Ki', 'Mi', 'Gi', 'Ti', 'Pi', 'Ei', 'Zi', 'Yi'), 'iec_ext' : ('byte', 'kibi', 'mebi', 'gibi', 'tebi', 'pebi', 'exbi', 'zebi', 'yobi'), } n = int(n) if n < 0: raise ValueError("n < 0") symbols = SYMBOLS[symbols] prefix = {} for i, s in enumerate(symbols[1:]): prefix[s] = 1 << (i+1)*10 for symbol in reversed(symbols[1:]): if n >= prefix[symbol]: value = float(n) / prefix[symbol] return format % locals() return format % dict(symbol=symbols[0], value=n) ############################################################ ### ### main () ### ############################################################ if __name__ == '__main__': dir_tree = {} ### version, that uses 'slow' [os.walk method] #get_size = get_dir_size_walk ### this recursive version can benefit from caching the function calls (functools.lru_cache) get_size = get_dir_size for root, dirs, files in os.walk(start_dir): for d in dirs: dir_path = os.path.join(root, d) if os.path.isdir(dir_path): dir_tree[dir_path] = get_size(dir_path) for d, size in sorted(dir_tree.items(), key=operator.itemgetter(1), reverse=True): print('%s\t%s' %(bytes2human(size, format='%(value).2f%(symbol)s'), d)) print('-' * 80) if sys.version_info >= (3,2,0): print(get_dir_size.cache_info())

示例输出：

 37.61M .\subdir_b 2.18M .\subdir_a 2.17M .\subdir_a\subdir_a_2 4.41K .\subdir_a\subdir_a_1 ---------------------------------------------------------- CacheInfo(hits=2, misses=4, maxsize=4096, currsize=4)

编辑：移动上面的null_decorator，build议用户2233949

晚了一点晚了，但在一行，只要你有glob2和人性化安装。请注意，在Python 3中，默认的iglob具有recursion模式。如何修改Python 3的代码对读者来说是一个微不足道的工作。

 >>> import os >>> from humanize import naturalsize >>> from glob2 import iglob >>> naturalsize(sum(os.path.getsize(x) for x in iglob('/var/**')))) '546.2 MB'

为了获得一个文件的大小，有os.path.getsize（）

 >>> import os >>> os.path.getsize("/path/file") 35L

它以字节报告。

 import os def get_size(path): total_size = 0 for dirpath, dirnames, filenames in os.walk(path): for f in filenames: if os.path.exists(fp): fp = os.path.join(dirpath, f) total_size += os.path.getsize(fp) return total_size # in megabytes

感谢monkut＆troex！这工作真的很好！

这个脚本告诉你哪个文件是CWD中最大的文件，并告诉你文件在哪个文件夹中。这个脚本适用于win8和python 3.3.3 shell

 import os folder=os.cwd() number=0 string="" for root, dirs, files in os.walk(folder): for file in files: pathname=os.path.join(root,file) ## print (pathname) ## print (os.path.getsize(pathname)/1024/1024) if number < os.path.getsize(pathname): number = os.path.getsize(pathname) string=pathname ## print () print (string) print () print (number) print ("Number in bytes")

无可否认，这是一种黑客行为，只能在Unix / Linux上运行。

它匹配du -sb . 因为实际上这是一个运行du -sb .的Python bash包装器du -sb . 命令。

 import subprocess def system_command(cmd): """"Function executes cmd parameter as a bash command.""" p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True) stdout, stderr = p.communicate() return stdout, stderr size = int(system_command('du -sb . ')[0].split()[0])

我使用python 2.7.13与scandir ，这里是我的单行recursion函数来获取文件夹的总大小：

 from scandir import scandir def getTotFldrSize(path): return sum([s.stat(follow_symlinks=False).st_size for s in scandir(path) if s.is_file(follow_symlinks=False)]) + \ + sum([getTotFldrSize(s.path) for s in scandir(path) if s.is_dir(follow_symlinks=False)]) >>> print getTotFldrSize('.') 1203245680

https://pypi.python.org/pypi/scandir

当计算子目录的大小时，它应该更新其父文件夹的大小，这将继续，直到到达根父级。

以下函数计算文件夹及其所有子文件夹的大小。

 import os def folder_size(path): parent = {} # path to parent path mapper folder_size = {} # storing the size of directories folder = os.path.realpath(path) for root, _, filenames in os.walk(folder): if root == folder: parent[root] = -1 # the root folder will not have any parent folder_size[root] = 0.0 # intializing the size to 0 elif root not in parent: immediate_parent_path = os.path.dirname(root) # extract the immediate parent of the subdirectory parent[root] = immediate_parent_path # store the parent of the subdirectory folder_size[root] = 0.0 # initialize the size to 0 total_size = 0 for filename in filenames: filepath = os.path.join(root, filename) total_size += os.stat(filepath).st_size # computing the size of the files under the directory folder_size[root] = total_size # store the updated size temp_path = root # for subdirectories, we need to update the size of the parent till the root parent while parent[temp_path] != -1: folder_size[parent[temp_path]] += total_size temp_path = parent[temp_path] return folder_size[folder]/1000000.0

为了什么值得…树命令完成这一切是免费的：

 tree -h --du /path/to/dir # files and dirs tree -h -d --du /path/to/dir # dirs only

我喜欢Python，但到目前为止，解决这个问题最简单的方法是不需要新的代码。

使用库sh ：模块du做到这一点：

 pip install sh import sh print( sh.du("-s", ".") ) 91154728 .

如果你想通过asterix，使用glob在这里描述。

要转换人类可读性的值，请使用人性化：

 pip install humanize import humanize print( humanize.naturalsize( 91157384 ) ) 91.2 MB

使用Python计算目录大小？

如何在使用scprecursion复制dir时过滤文件？

目录中特定文件的PHP列表

如何检查一个目录是否可以在PHP中写入？

在windows下的php访问networkingpath

如何忽略SVN的目录？

JavaFX FileChooser

在php中使用opendir（）按字母顺序sorting和显示目录列表

如何从代码configurationnltk数据目录？

如何将文件夹层次结构中的所有git内容上移一级？

将文件从一个目录复制到现有目录中