用Python读取大文本文件，逐行读取，不加载到内存中

我需要逐行阅读一个大文件。可以说，文件有5GB以上，我需要阅读每一行，但显然我不想使用readlines()因为它会在内存中创build一个非常大的列表。

下面的代码将如何工作？ xreadlines本身是一个一个地读入内存？生成器expression式是否需要？

 f = (line for line in open("log.txt").xreadlines()) # how much is loaded in memory? f.next()

另外，我可以做什么来以相反的顺序读取，就像Linux tail命令一样？

我发现：

http://code.google.com/p/pytailer/

和

“ 通过文本文件的行读取python头，尾巴和向后 ”

两个都工作得很好！

我提供了这个答案，因为Keith虽然简洁，但并没有明确地closures文件

 with open("log.txt") as infile: for line in infile: do_something_with(line)

所有你需要做的就是使用文件对象作为迭代器。

 for line in open("log.txt"): do_something_with(line)

更好的是在最近的Python版本中使用上下文pipe理器。

 with open("log.txt") as fileobject: for line in fileobject: do_something_with(line)

这也会自动closures文件。

老派的做法：

 fh = open(file_name, 'rt') line = fh.readline() while line: # do stuff with line line = fh.readline() fh.close()

您最好使用迭代器代替。相关： http : //docs.python.org/library/fileinput.html

从文档：

 import fileinput for line in fileinput.input("filename"): process(line)

这将避免一次将整个文件复制到内存中。

我简直不敢相信这可能像@ john-la-rooy的回答看起来那么简单。所以，我重新使用逐行读写的cp命令。这是疯狂的快速。

 #!/usr/bin/env python3.6 import sys with open(sys.argv[2], 'w') as outfile: with open(sys.argv[1]) as infile: for line in infile: outfile.write(line)

这个怎么样？将文件分成块，然后逐行读取，因为读取文件时，操作系统会caching下一行。如果您正在逐行读取文件，则不会有效地使用caching的信息。

相反，将文件分成块并将整个块加载到内存中，然后进行处理。

 def chunks(file,size=1024): while 1: startat=fh.tell() print startat #file's object current position from the start fh.seek(size,1) #offset from current postion -->1 data=fh.readline() yield startat,fh.tell()-startat #doesnt store whole list in memory if not data: break if os.path.isfile(fname): try: fh=open(fname,'rb') except IOError as e: #file --> permission denied print "I/O error({0}): {1}".format(e.errno, e.strerror) except Exception as e1: #handle other exceptions such as attribute errors print "Unexpected error: {0}".format(e1) for ele in chunks(fh): fh.seek(ele[0])#startat data=fh.read(ele[1])#endat print data

 f=open('filename','r').read() f1=f.split('\n') for i in range (len(f1)): do_something_with(f1[i])

希望这可以帮助。

用Python读取大文本文件，逐行读取，不加载到内存中

使用Python setuptools安装后的脚本

何时使用Tornado，何时使用Twisted / Cyclone / GEvent /其他

在Python中快速简单的文件对话框？

Python：元组/字典作为键，select，sorting

使用Django和Python创建一个JSON响应

从IP IP查找主机名1秒超时

用matplotlib同时绘制两个直方图

如何在pandas的两列中形成元组列

Python：如何检查networking端口是否在Linux上打开？

Python中的any（）函数带callback