Pythonrecursion文件夹读取

我有一个C ++ / Obj-C的背景,我只是发现Python(已经写了大约一个小时)。 我正在写一个脚本recursion读取文件夹结构中的文本文件的内容。

我遇到的问题是我写的代码只能用于一个文件夹。 我可以看到为什么在代码中(请参阅#hardcoded path ),我只是不知道我可以如何前进与Python,因为我的经验只是全新的。

Python代码:

 import os import sys rootdir = sys.argv[1] for root, subFolders, files in os.walk(rootdir): for folder in subFolders: outfileName = rootdir + "/" + folder + "/py-outfile.txt" # hardcoded path folderOut = open( outfileName, 'w' ) print "outfileName is " + outfileName for file in files: filePath = rootdir + '/' + file f = open( filePath, 'r' ) toWrite = f.read() print "Writing '" + toWrite + "' to" + filePath folderOut.write( toWrite ) f.close() folderOut.close() 

确保你了解os.walk的三个返回值:

 for root, subdirs, files in os.walk(rootdir): 

具有以下含义:

  • root :“走过”的当前path
  • subdirsroottypes的root目录中的文件
  • files :除目录以外的其他文件(不在subdirs

请使用os.path.join而不是用斜线连接! 你的问题是filePath = rootdir + '/' + file – 你必须连接当前“走”的文件夹,而不是最顶层的文件夹。 所以,必须是filePath = os.path.join(root, file) 。 BTW“文件”是一个内置的,所以你通常不使用它作为variables名称。

另一个问题是你的循环,应该是这样的,例如:

 import os import sys walk_dir = sys.argv[1] print('walk_dir = ' + walk_dir) # If your current working directory may change during script execution, it's recommended to # immediately convert program arguments to an absolute path. Then the variable root below will # be an absolute path as well. Example: # walk_dir = os.path.abspath(walk_dir) print('walk_dir (absolute) = ' + os.path.abspath(walk_dir)) for root, subdirs, files in os.walk(walk_dir): print('--\nroot = ' + root) list_file_path = os.path.join(root, 'my-directory-list.txt') print('list_file_path = ' + list_file_path) with open(list_file_path, 'wb') as list_file: for subdir in subdirs: print('\t- subdirectory ' + subdir) for filename in files: file_path = os.path.join(root, filename) print('\t- file %s (full path: %s)' % (filename, file_path)) with open(file_path, 'rb') as f: f_content = f.read() list_file.write(('The file %s contains:\n' % filename).encode('utf-8')) list_file.write(f_content) list_file.write(b'\n') 

如果你不知道,文件的with语句是一个简写:

 with open('filename', 'rb') as f: dosomething() # is effectively the same as f = open('filename', 'rb') try: dosomething() finally: f.close() 

同意Dave Webb, os.walk将产生树中每个目录的一个项目。 事实是,你只是不必关心子文件subFolders

这样的代码应该工作:

 import os import sys rootdir = sys.argv[1] for folder, subs, files in os.walk(rootdir): with open(os.path.join(folder, 'python-outfile.txt'), 'w') as dest: for filename in files: with open(os.path.join(folder, filename), 'r') as src: dest.write(src.read()) 

如果您使用Python 3.5+或更高版本,则可以在1行中完成此操作。

 for filename in glob.iglob(root_dir + '**/*.txt', recursive=True): print(filename) 

正如文件中所提到的

如果recursion是真的,模式'**'将匹配任何文件和零个或多个目录和子目录。

如果你想要每一个文件,你可以使用

 for filename in glob.iglob(root_dir + '**/*', recursive=True): print(filename) 

使用os.path.join()来构build你的path – 它是整洁的:

 import os import sys rootdir = sys.argv[1] for root, subFolders, files in os.walk(rootdir): for folder in subFolders: outfileName = os.path.join(root,folder,"py-outfile.txt") folderOut = open( outfileName, 'w' ) print "outfileName is " + outfileName for file in files: filePath = os.path.join(root,file) toWrite = open( filePath).read() print "Writing '" + toWrite + "' to" + filePath folderOut.write( toWrite ) folderOut.close() 

os.walk默认情况下是recursion的。 对于每个目录,从根开始它产生一个3元组(dirpath,dirnames,文件名)

 from os import walk from os.path import splitext, join def select_files(root, files): """ simple logic here to filter out interesting files .py files in this example """ selected_files = [] for file in files: #do concatenation here to get full path full_path = join(root, file) ext = splitext(file)[1] if ext == ".py": selected_files.append(full_path) return selected_files def build_recursive_dir_tree(path): """ path - where to begin folder scan """ selected_files = [] for root, dirs, files in walk(path): selected_files += select_files(root, files) return selected_files 

我认为问题是你没有正确处理os.walk的输出。

首先,改变:

 filePath = rootdir + '/' + file 

至:

 filePath = root + '/' + file 

rootdir是你的固定起始目录; rootos.walk返回的目录。

其次,你不需要缩进你的文件处理循环,因为这对每个子目录都是没有意义的。 你会得到root设置为每个子目录。 你不需要手工处理子目录,除非你想对目录本身做些什么。

尝试这个:

 import os import sys for root, subdirs, files in os.walk(path): for file in os.listdir(root): filePath = os.path.join(root, file) if os.path.isdir(filePath): pass else: f = open (filePath, 'r') # Do Stuff