如何从Python中的文件一次读取一个字符?

谁能告诉我,我怎么能做到这一点?

with open(filename) as f: while True: c = f.read(1) if not c: print "End of file" break print "Read a character:", c 

先打开一个文件:

 with open("filename") as fileobj: for line in fileobj: for ch in line: print ch 

我喜欢接受的答案:简单直接,完成工作。 我也想提供一个替代的实现:

 def chunks(filename, buffer_size=4096): """Reads `filename` in chunks of `buffer_size` bytes and yields each chunk until no more characters can be read; the last chunk will most likely have less than `buffer_size` bytes. :param str filename: Path to the file :param int buffer_size: Buffer size, in bytes (default is 4096) :return: Yields chunks of `buffer_size` size until exhausting the file :rtype: str """ with open(filename, "rb") as fp: chunk = fp.read(buffer_size) while chunk: yield chunk chunk = fp.read(buffer_size) def chars(filename, buffersize=4096): """Yields the contents of file `filename` character-by-character. Warning: will only work for encodings where one character is encoded as one byte. :param str filename: Path to the file :param int buffer_size: Buffer size for the underlying chunks, in bytes (default is 4096) :return: Yields the contents of `filename` character-by-character. :rtype: char """ for chunk in chunks(filename, buffersize): for char in chunk: yield char def main(buffersize, filenames): """Reads several files character by character and redirects their contents to `/dev/null`. """ for filename in filenames: with open("/dev/null", "wb") as fp: for char in chars(filename, buffersize): fp.write(char) if __name__ == "__main__": # Try reading several files varying the buffer size import sys buffersize = int(sys.argv[1]) filenames = sys.argv[2:] sys.exit(main(buffersize, filenames)) 

我build议的代码与您接受的答案基本相同:从文件中读取给定数量的字节。 不同之处在于它首先读取了大量的数据(4006是X86的一个很好的默认值,但是您可能想要尝试1024或者8192;任何页面大小的倍数),然后产生该块中的字符一个。

我提供的代码可能会更快的更大的文件。 以托尔斯泰的战争与和平全文为例。 这些是我的计时结果(Mac Book Pro使用OS X 10.7.4; so.py是我给我粘贴的代码的名称):

 $ time python so.py 1 2600.txt.utf-8 python so.py 1 2600.txt.utf-8 3.79s user 0.01s system 99% cpu 3.808 total $ time python so.py 4096 2600.txt.utf-8 python so.py 4096 2600.txt.utf-8 1.31s user 0.01s system 99% cpu 1.318 total 

现在:不要把4096的缓冲区大小看作一个普遍的事实; 看看我得到的不同大小的结果(缓冲区大小(字节)vs墙时间(秒)):

  2 2.726 4 1.948 8 1.693 16 1.534 32 1.525 64 1.398 128 1.432 256 1.377 512 1.347 1024 1.442 2048 1.316 4096 1.318 

正如你所看到的,你可以开始看到收益(我的时间可能非常不准确)。 缓冲区大小是性能和内存之间的折衷。 4096的默认值只是一个合理的select,但是和往常一样,首先要测量。

Python本身可以在交互模式下帮助你:

 >>> help(file.read) Help on method_descriptor: read(...) read([size]) -> read at most size bytes, returned as a string. If the size argument is negative or omitted, read until EOF is reached. Notice that when in non-blocking mode, less data than what was requested may be returned, even if no size parameter was given. 

只是:

 myfile = open(filename) onecaracter = myfile.read(1) 

今天,我学习了一个新的成语,同时观看Raymond Hettinger的“ 转换代码”到“美丽的,习语Python :

 import functools with open(filename) as f: f_read_ch = functools.partial(f.read, 1) for ch in iter(f_read_ch, ''): print 'Read a character:', repr(ch) 

只读一个字符

 f.read(1) 

你应该尝试f.read(1) ,这绝对是正确的,是正确的。

 f = open('hi.txt', 'w') f.write('0123456789abcdef') f.close() f = open('hej.txt', 'r') f.seek(12) print f.read(1) # This will read just "c" 

这也将工作:

 with open("filename") as fileObj: for line in fileObj: for ch in line: print(ch) 

它遍历文件中的每一行和每一行中的每一个字符。

为了补充,如果你正在读取的文件包含一行很大的内容,这可能会破坏你的内存,你可能会考虑将它们读入一个缓冲区,然后产生每个字符

 def read_char(inputfile, buffersize=10240): with open(inputfile, 'r') as f: while True: buf = f.read(buffersize) if not buf: break for char in buf: yield char yield '' #handle the scene that the file is empty if __name__ == "__main__": for word in read_char('./very_large_file.txt'): process(char)