我如何从一个文件一次使用Python读取两行

我正在编写一个parsing文本文件的python脚本。 这个文本文件的格式是这样的,文件中的每个元素使用两行,为了方便,我想在parsing之前读取两行。 这可以在Python中完成吗?

我想要一些像这样的东西:

f = open(filename, "r") for line in f: line1 = line line2 = f.readline() f.close 

但是这打破了说:

ValueError:混合迭代和读取方法会丢失数据

有关:

  • 什么是在块中迭代列表的最“pythonic”方式?

类似的问题在这里 你不能混合迭代和readline,所以你需要使用一个或另一个。

 while True: line1 = f.readline() line2 = f.readline() if not line2: break # EOF ... 
 import itertools with open('a') as f: for line1,line2 in itertools.izip_longest(*[f]*2): print(line1,line2) 

izip_longest返回一个迭代器,所以即使文件非常大,它也应该工作。

如果有奇数行,则line2在最后一次迭代中获取值None。

izip_longest是在itertools如果你有Python 2.6或更高版本。 如果你使用以前的版本,你可以在这里select一个izip_longest的python实现。 在Python3中, itertools.izip_longest被重命名为itertools.zip_longest


在注释中,有人问这个解决scheme是否先读取整个文件,然后再次遍历文件。 我相信它没有。 with open('a') as f行打开文件句柄,但不读取文件。 f是一个迭代器,所以它的内容直到请求才会被读取。 izip_longest以迭代器作为参数,并返回一个迭代器。

izip_longest实际上是馈送相同的迭代器,f,两次。 但是最终发生的是在第一个参数上调用f.next()(或Python3中的next(f)),然后在第二个参数上调用。 由于next()在同一个底层迭代器上被调用,因此会产生连续的行。 这与在整个文件中读取完全不同。 实际上,使用迭代器的目的正是为了避免在整个文件中读取。

因此,我相信解决scheme可以按照需要运行 – 文件只能通过for循环读取一次。

为了证实这一点,我运行了izip_longest解决scheme,而不是使用f.readlines()的解决scheme。 我在最后放置了一个raw_input()来暂停脚本,并在每个脚本上运行ps axuw

 % ps axuw | grep izip_longest_method.py 

unutbu 11119 2.2 0.2 4520 2712 pts/0 S+ 21:14 0:00 python /home/unutbu/pybin/izip_longest_method.py bigfile

 % ps axuw | grep readlines_method.py 

unutbu 11317 6.5 8.8 93908 91680 pts/0 S+ 21:16 0:00 python /home/unutbu/pybin/readlines_method.py bigfile

readlines清晰地读入整个文件。 由于izip_longest_method使用的内存less得多,所以我认为可以安全地断定它不是一次读取整个文件。

使用line.next(),例如

 f=open("file") for line in f: print line nextline=f.next() print "next line", nextline .... f.close() 

我将以与ghostdog74类似的方式进行 ,只是在外面尝试一下,并进行一些修改:

 try: with open(filename) as f: for line1 in f: line2 = f.next() # process line1 and line2 here except StopIteration: print "(End)" # do whatever you need to do with line1 alone 

这使代码简单而强大。 使用withclosures文件,如果还有其他事情发生,或只是closures资源,一旦你用尽了,退出循环。

请注意, with需要2.6或2.5的with_statement启用with_statementfunction。

适用于偶数和奇数长度的文件。 它只是忽略了无与伦比的最后一行。

 f=file("file") lines = f.readlines() for even, odd in zip(lines[0::2], lines[1::2]): print "even : ", even print "odd : ", odd print "end cycle" f.close() 

如果你有大文件,这是不正确的做法。 您正在使用readlines()加载内存中的所有文件。 我曾经写过一个读取文件的类,保存每个行首的fseek位置。 这可以让你获得特定的行,而不需要在内存中的所有文件,你也可以前进和后退。

我把它粘贴在这里。 许可证是公有领域,意思是做你想要的东西。 请注意,这个class级是在6年前写的,自那以后我就没有碰过或检查过。 我认为它甚至不符合文件。 注意力不足 此外,请注意,这是对你的问题矫枉过正。 我不是说你一定要这样做,但是我有这个代码,如果你需要更复杂的访问,我喜欢分享它。

 import string import re class FileReader: """ Similar to file class, but allows to access smoothly the lines as when using readlines(), with no memory payload, going back and forth, finding regexps and so on. """ def __init__(self,filename): # fold>> self.__file=file(filename,"r") self.__currentPos=-1 # get file length self.__file.seek(0,0) counter=0 line=self.__file.readline() while line != '': counter = counter + 1 line=self.__file.readline() self.__length = counter # collect an index of filedescriptor positions against # the line number, to enhance search self.__file.seek(0,0) self.__lineToFseek = [] while True: cur=self.__file.tell() line=self.__file.readline() # if it's not null the cur is valid for # identifying a line, so store self.__lineToFseek.append(cur) if line == '': break # <<fold def __len__(self): # fold>> """ member function for the operator len() returns the file length FIXME: better get it once when opening file """ return self.__length # <<fold def __getitem__(self,key): # fold>> """ gives the "key" line. The syntax is import FileReader f=FileReader.FileReader("a_file") line=f[2] to get the second line from the file. The internal pointer is set to the key line """ mylen = self.__len__() if key < 0: self.__currentPos = -1 return '' elif key > mylen: self.__currentPos = mylen return '' self.__file.seek(self.__lineToFseek[key],0) counter=0 line = self.__file.readline() self.__currentPos = key return line # <<fold def next(self): # fold>> if self.isAtEOF(): raise StopIteration return self.readline() # <<fold def __iter__(self): # fold>> return self # <<fold def readline(self): # fold>> """ read a line forward from the current cursor position. returns the line or an empty string when at EOF """ return self.__getitem__(self.__currentPos+1) # <<fold def readbackline(self): # fold>> """ read a line backward from the current cursor position. returns the line or an empty string when at Beginning of file. """ return self.__getitem__(self.__currentPos-1) # <<fold def currentLine(self): # fold>> """ gives the line at the current cursor position """ return self.__getitem__(self.__currentPos) # <<fold def currentPos(self): # fold>> """ return the current position (line) in the file or -1 if the cursor is at the beginning of the file or len(self) if it's at the end of file """ return self.__currentPos # <<fold def toBOF(self): # fold>> """ go to beginning of file """ self.__getitem__(-1) # <<fold def toEOF(self): # fold>> """ go to end of file """ self.__getitem__(self.__len__()) # <<fold def toPos(self,key): # fold>> """ go to the specified line """ self.__getitem__(key) # <<fold def isAtEOF(self): # fold>> return self.__currentPos == self.__len__() # <<fold def isAtBOF(self): # fold>> return self.__currentPos == -1 # <<fold def isAtPos(self,key): # fold>> return self.__currentPos == key # <<fold def findString(self, thestring, count=1, backward=0): # fold>> """ find the count occurrence of the string str in the file and return the line catched. The internal cursor is placed at the same line. backward is the searching flow. For example, to search for the first occurrence of "hello starting from the beginning of the file do: import FileReader f=FileReader.FileReader("a_file") f.toBOF() f.findString("hello",1,0) To search the second occurrence string from the end of the file in backward movement do: f.toEOF() f.findString("hello",2,1) to search the first occurrence from a given (or current) position say line 150, going forward in the file f.toPos(150) f.findString("hello",1,0) return the string where the occurrence is found, or an empty string if nothing is found. The internal counter is placed at the corresponding line number, if the string was found. In other case, it's set at BOF if the search was backward, and at EOF if the search was forward. NB: the current line is never evaluated. This is a feature, since we can so traverse occurrences with a line=f.findString("hello") while line == '': line.findString("hello") instead of playing with a readline every time to skip the current line. """ internalcounter=1 if count < 1: count = 1 while 1: if backward == 0: line=self.readline() else: line=self.readbackline() if line == '': return '' if string.find(line,thestring) != -1 : if count == internalcounter: return line else: internalcounter = internalcounter + 1 # <<fold def findRegexp(self, theregexp, count=1, backward=0): # fold>> """ find the count occurrence of the regexp in the file and return the line catched. The internal cursor is placed at the same line. backward is the searching flow. You need to pass a regexp string as theregexp. returns a tuple. The fist element is the matched line. The subsequent elements contains the matched groups, if any. If no match returns None """ rx=re.compile(theregexp) internalcounter=1 if count < 1: count = 1 while 1: if backward == 0: line=self.readline() else: line=self.readbackline() if line == '': return None m=rx.search(line) if m != None : if count == internalcounter: return (line,)+m.groups() else: internalcounter = internalcounter + 1 # <<fold def skipLines(self,key): # fold>> """ skip a given number of lines. Key can be negative to skip backward. Return the last line read. Please note that skipLines(1) is equivalent to readline() skipLines(-1) is equivalent to readbackline() and skipLines(0) is equivalent to currentLine() """ return self.__getitem__(self.__currentPos+key) # <<fold def occurrences(self,thestring,backward=0): # fold>> """ count how many occurrences of str are found from the current position (current line excluded... see skipLines()) to the begin (or end) of file. returns a list of positions where each occurrence is found, in the same order found reading the file. Leaves unaltered the cursor position. """ curpos=self.currentPos() list = [] line = self.findString(thestring,1,backward) while line != '': list.append(self.currentPos()) line = self.findString(thestring,1,backward) self.toPos(curpos) return list # <<fold def close(self): # fold>> self.__file.close() # <<fold 
 file_name ='your_file_name'
 file_open = open(file_name,'r')

 def处理程序(line_one,line_two):
    打印(line_one,line_two)

而file_open:
    尝试:
         one = file_open.next()
         two = file_open.next() 
        处理程序(一,二)
    除了(StopIterationexception):
         file_open.close()
        打破
 def readnumlines(file, num=2): f = iter(file) while True: lines = [None] * num for i in range(num): try: lines[i] = f.next() except StopIteration: # EOF or not enough lines available return yield lines # use like this f = open("thefile.txt", "r") for line1, line2 in readnumlines(f): # do something with line1 and line2 # or for line1, line2, line3, ..., lineN in readnumlines(f, N): # do something with N lines 

这个怎么样,有人看到它的问题

 f=open('file_name') for line,line2 in zip(f,f): print line,line2 
 f = open(filename, "r") for line in f: line1 = line f.next() f.close 

现在,你可以每两行读一次文件。 如果你喜欢,你也可以在f.next()之前检查f状态

我的想法是创build一个生成器,每次从文件中读取两行,并将其作为二元组返回,这意味着您可以迭代结果。

 from cStringIO import StringIO def read_2_lines(src): while True: line1 = src.readline() if not line1: break line2 = src.readline() if not line2: break yield (line1, line2) data = StringIO("line1\nline2\nline3\nline4\n") for read in read_2_lines(data): print read 

如果你有一个奇数行,它不会完美的工作,但这应该给你一个很好的轮廓。

上个月我曾经遇到类似的问题。 我尝试了f.readline()以及f.readlines()的while循环。 我的数据文件并不是很大,所以我最终select了f.readlines(),这让我更好地控制了索引,否则我不得不使用f.seek()来回移动文件指针。

我的情况比OP更复杂。 因为我的数据文件在每次要parsing的行数上更加灵活,所以在parsing数据之前我必须检查一些条件。

我发现f.seek()的另一个问题是,当我使用codecs.open('','r','utf-8')时,它不能很好的处理utf-8,罪魁祸首,最终我放弃了这种做法。)

简单的小读者。 它将以两对的forms拉线,并在迭代对象时将它们作为元组返回。 您可以手动closures它,或者在超出范围时closures。

 class doublereader: def __init__(self,filename): self.f = open(filename, 'r') def __iter__(self): return self def next(self): return self.f.next(), self.f.next() def close(self): if not self.f.closed: self.f.close() def __del__(self): self.close() #example usage one r = doublereader(r"C:\file.txt") for a, h in r: print "x:%s\ny:%s" % (a,h) r.close() #example usage two for x,y in doublereader(r"C:\file.txt"): print "x:%s\ny:%s" % (x,y) #closes itself as soon as the loop goes out of scope 

如果文件的大小合理,则另一种使用列表理解将整个文件读入2元组列表的方法是:

 filaname = '/path/to/file/name' with open(filename, 'r') as f: list_of_2tuples = [ (line,f.readline()) for line in f ] for (line1,line2) in list_of_2tuples: # Work with them in pairs. print('%s :: %s', (line1,line2)) 

这个Python代码将打印前两行:

 import linecache filename = "ooxx.txt" print(linecache.getline(filename,2))