最有效的方法来searchPython中的文件的最后x行

我有一个文件，我不知道它会有多大（它可能会相当大，但大小会有很大的不同）。我想search最后10行左右，看看是否有任何匹配的string。我需要尽快和有效地做到这一点，并想知道是否有什么比：

s = "foo" last_bit = fileObj.readlines()[-10:] for line in last_bit: if line == s: print "FOUND"

 # Tail from __future__ import with_statement find_str = "FIREFOX" # String to find fname = "g:/autoIt/ActiveWin.log_2" # File to check with open(fname, "r") as f: f.seek (0, 2) # Seek @ EOF fsize = f.tell() # Get Size f.seek (max (fsize-1024, 0), 0) # Set pos @ last n chars lines = f.readlines() # Read to end lines = lines[-10:] # Get last 10 lines # This returns True if any line is exactly find_str + "\n" print find_str + "\n" in lines # If you're searching for a substring for line in lines: if find_str in line: print True break

这里有一个像MizardX这样的答案，但是在最坏的情况下没有明显的问题，就是在添加块的时候重新扫描工作string。

与activestate解决scheme（似乎也是二次的）相比，这不会炸毁给定一个空文件，并且每个块读取而不是两个。

与产卵“尾巴”相比，这是自给自足的。（但是，如果你有它，“尾巴”是最好的。）

相比之下抓住几个字节，并希望它已经足够了，这适用于任何行的长度。

 import os def reversed_lines(file): "Generate the lines of file in reverse order." part = '' for block in reversed_blocks(file): for c in reversed(block): if c == '\n' and part: yield part[::-1] part = '' part += c if part: yield part[::-1] def reversed_blocks(file, blocksize=4096): "Generate blocks of file's contents in reverse order." file.seek(0, os.SEEK_END) here = file.tell() while 0 < here: delta = min(blocksize, here) here -= delta file.seek(here, os.SEEK_SET) yield file.read(delta)

要按要求使用它：

 from itertools import islice def check_last_10_lines(file, key): for line in islice(reversed_lines(file), 10): if line.rstrip('\n') == key: print 'FOUND' break

编辑：在head（）中将map（）更改为itertools.imap（）。 编辑2：简化的reverse_blocks（）。 编辑3：避免重新扫描换行符的尾巴。 编辑4：重写了reversed_lines（），因为str.splitlines（）忽略了最后的'\ n'，正如BrianB注意到的（谢谢）。

请注意，在非常古老的Python版本中，循环中的string连接将花费二次时间。从至less在过去的几年CPython自动避免了这个问题。

如果您在POSIX系统上运行Python，则可以使用“尾部-10”来检索最后几行。这可能比编写自己的Python代码来获得最后10行更快。不要直接打开文件，而是从命令“tail -10 filename”打开一个pipe道。如果你对日志输出有一定的了解（例如，你知道从来没有任何长度为几百或几千个字符的行），那么使用其中一个“读取最后2KB”的方法就可以了。

我认为阅读最后2 KB的文件应该确保你得到10行，不应该太多的资源。

 file_handle = open("somefile") file_size = file_handle.tell() file_handle.seek(max(file_size - 2*1024, 0)) # this will get rid of trailing newlines, unlike readlines() last_10 = file_handle.read().splitlines()[-10:] assert len(last_10) == 10, "Only read %d lines" % len(last_10)

这是一个使用mmap的版本，效率很高。最大的好处是mmap会自动处理文件到内存分页需求。

 import os from mmap import mmap def lastn(filename, n): # open the file and mmap it f = open(filename, 'r+') m = mmap(f.fileno(), os.path.getsize(f.name)) nlcount = 0 i = m.size() - 1 if m[i] == '\n': n += 1 while nlcount < n and i > 0: if m[i] == '\n': nlcount += 1 i -= 1 if i > 0: i += 2 return m[i:].splitlines() target = "target string" print [l for l in lastn('somefile', 10) if l == target]

我想我记得当我不得不做类似的事情时，从Manu Garg的博客文章中修改代码。

如果你在unix盒子上， os.popen("tail -10 " + filepath).readlines()可能是最快的方法。否则，这取决于你想要的强大程度。到目前为止所提出的方法都会以某种方式失败。在最常见的情况下，为了提高鲁棒性和速度，您可能需要像对数search一样的东西：使用file.seek去文件尾部减去1000个字符，读入它，检查它包含的行数，然后到EOF减去3000个字符，读取2000个字符，计算行数，然后EOF减去7000，读入4000个字符，计算行数等等，直到有足够多的行数为止。但是，如果你确定知道它总是在具有合理行长度的文件上运行，那么你可能不需要这样做。

您可能还会在unix tail命令的源代码中find一些灵感。

我遇到这个问题，parsingLARGE系统日志文件的最后一个小时，并从activestate的配方网站使用此function…（ http://code.activestate.com/recipes/439045/ ）

 !/usr/bin/env python # -*-mode: python; coding: iso-8859-1 -*- # # Copyright (c) Peter Astrand <astrand@cendio.se> import os import string class BackwardsReader: """Read a file line by line, backwards""" BLKSIZE = 4096 def readline(self): while 1: newline_pos = string.rfind(self.buf, "\n") pos = self.file.tell() if newline_pos != -1: # Found a newline line = self.buf[newline_pos+1:] self.buf = self.buf[:newline_pos] if pos != 0 or newline_pos != 0 or self.trailing_newline: line += "\n" return line else: if pos == 0: # Start-of-file return "" else: # Need to fill buffer toread = min(self.BLKSIZE, pos) self.file.seek(-toread, 1) self.buf = self.file.read(toread) + self.buf self.file.seek(-toread, 1) if pos - toread == 0: self.buf = "\n" + self.buf def __init__(self, file): self.file = file self.buf = "" self.file.seek(-1, 2) self.trailing_newline = 0 lastchar = self.file.read(1) if lastchar == "\n": self.trailing_newline = 1 self.file.seek(-1, 2) # Example usage br = BackwardsReader(open('bar')) while 1: line = br.readline() if not line: break print repr(line)

它工作得很好，比像fileObj.readlines（）[ – 10：]这样的东西更有效率，它使python将整个文件读入内存，然后将最后十行切掉。

您可以从文件末尾读取1,000个字节左右的块，直到有10行。

您也可以通过文件反转来计算行数，而不是猜测字节偏移量。

 lines = 0 chunk_size = 1024 f = file('filename') f.seek(0, 2) f.seek(f.tell() - chunk_size) while True: s = f.read(chunk_size) lines += s.count('\n') if lines > NUM_OF_LINES: break f.seek(f.tell() - chunk_size*2)

现在该文件处于运行readlines()的好位置。您也可以caching您第一次读取的string，以消除两次读取文件的相同部分。

我采取了mhawke的build议，使用mmap并写了一个使用rfind的版本：

 from mmap import mmap import sys def reverse_file(f): mm = mmap(f.fileno(), 0) nl = mm.size() - 1 prev_nl = mm.size() while nl > -1: nl = mm.rfind('\n', 0, nl) yield mm[nl + 1:prev_nl] prev_nl = nl + 1 def main(): # Example usage with open('test.txt', 'r+') as infile: for line in reverse_file(infile): sys.stdout.write(line)

读取文件的最后几个K，然后将其分割成只返回最后10个的行。

该块的开始不太可能落在线边界上，但是无论如何您将丢弃第一行。

就我个人而言，我会试图打开shell并调用tail -n10来加载文件。但是，我不是一个真正的Python程序员;）

首先，返回一个列表的函数：

 def lastNLines(file, N=10, chunksize=1024): lines = None file.seek(0,2) # go to eof size = file.tell() for pos in xrange(chunksize,size-1,chunksize): # read a chunk file.seek(pos,2) chunk = file.read(chunksize) if lines is None: # first time lines = chunk.splitlines() else: # other times, update the 'first' line with # the new data, and re-split lines[0:1] = (chunk + lines[0]).splitlines() if len(lines) > N: return lines[-N:] file.seek(0) chunk = file.read(size-pos) lines[0:1] = (chunk + lines[0]).splitlines() return lines[-N:]

其次，一个以相反的顺序遍历行的函数：

 def iter_lines_reversed(file, chunksize=1024): file.seek(0,2) size = file.tell() last_line = "" for pos in xrange(chunksize,size-1,chunksize): # read a chunk file.seek(pos,2) chunk = file.read(chunksize) + last_line # split into lines lines = chunk.splitlines() last_line = lines[0] # iterate in reverse order for index,line in enumerate(reversed(lines)): if index > 0: yield line # handle the remaining data at the beginning of the file file.seek(0) chunk = file.read(size-pos) + last_line lines = chunk.splitlines() for line in reversed(lines): yield line

举个例子：

 s = "foo" for index, line in enumerate(iter_lines_reversed(fileObj)): if line == s: print "FOUND" break elif index+1 >= 10: break

编辑：现在自动获取文件大小
Edit2：现在只能迭代10行。

此解决scheme将只读取一次文件，但使用2个文件对象指针可以获得最后N行文件而无需重新读取它：

 def getLastLines (path, n): # return the las N lines from the file indicated in path fp = open(path) for i in range(n): line = fp.readline() if line == '': return [] back = open(path) for each in fp: back.readline() result = [] for line in back: result.append(line[:-1]) return result s = "foo" last_bit = getLastLines(r'C:\Documents and Settings\ricardo.m.reyes\My Documents\desarrollo\tail.py', 10) for line in last_bit: if line == s: print "FOUND"

也许这可能是有用的：

 import os.path path = 'path_to_file' os.system('tail -n1 ' + path)

最有效的方法来searchPython中的文件的最后x行

Ruby有mkdir -p吗？

将列表的Python列表写入一个csv文件

如何获取整个文档的HTML作为一个string？

用Python写一个列表到一个文件

Selenium等到文档准备就绪

按文件types更改Vim缩进行为

Python – 何时使用文件vs打开

用Python读取和覆盖文件

如何在angularjs e2e量angular器testing中上传文件

使用ajax请求下载文件