Objective-C:逐行读取文件

在Objective-C中处理大型文本文件的适当方式是什么? 假设我需要分别读取每一行,并将每行视为一个NSString。 什么是最有效的方法呢?

一种解决scheme是使用NSString方法:

+ (id)stringWithContentsOfFile:(NSString *)path encoding:(NSStringEncoding)enc error:(NSError **)error 

然后用换行符分隔符分行,然后迭代数组中的元素。 但是,这似乎相当低效。 有没有简单的方法来对待文件作为一个stream,枚举每行,而不是一次只读一次? 有点像Java的java.io.BufferedReader。

这是一个很好的问题。 我认为@Diederik有一个很好的答案,虽然不幸的是Cocoa没有一个机制来完成你想要做的事情。

NSInputStream允许你读取N个字节的块(非常类似于java.io.BufferedReader ),但是你必须自己将它转换为一个NSString ,然后扫描换行符(或其他任何分隔符),并保存所有剩余的字符接下来阅读,或者阅读更多的字符,如果还没有阅读换行符。 ( NSFileHandle让你阅读一个NSData ,然后你可以转换成一个NSString ,但它本质上是一个过程。)

苹果有一个stream程编程指南 ,可以帮助填写的细节, 这个问题也可以帮助,以及如果你要处理uint8_t*缓冲区。

如果你要经常阅读这样的string(特别是在你的程序的不同部分),把这个行为封装在一个可以处理你的细节的类中,或者是NSInputStream子类化是个好主意。 被分类 ),并添加方法,让你正确地阅读你想要的。

为了logging,我认为这将是一个很好的function添加,我将提交一个改进请求,使这成为可能。 🙂


编辑:原来这个请求已经存在。 有一个从2006年开始的雷达(对于苹果公司内部人员来说,是rdar:// 4742914)。

这将用于从Text一般阅读String 。 如果你想阅读更长的文本(大尺寸的文本) ,然后使用这里提到的其他人的方法,如缓冲(保留在内存空间的文本的大小)

说你读了一个文本文件。

 NSString* filePath = @""//file path... NSString* fileRoot = [[NSBundle mainBundle] pathForResource:filePath ofType:@"txt"]; 

你想摆脱新的路线。

 // read everything from text NSString* fileContents = [NSString stringWithContentsOfFile:fileRoot encoding:NSUTF8StringEncoding error:nil]; // first, separate by new line NSArray* allLinedStrings = [fileContents componentsSeparatedByCharactersInSet: [NSCharacterSet newlineCharacterSet]]; // then break down even further NSString* strsInOneLine = [allLinedStrings objectAtIndex:0]; // choose whatever input identity you have decided. in this case ; NSArray* singleStrs = [currentPointString componentsSeparatedByCharactersInSet: [NSCharacterSet characterSetWithCharactersInString:@";"]]; 

你有它。

这应该做的伎俩:

 #include <stdio.h> NSString *readLineAsNSString(FILE *file) { char buffer[4096]; // tune this capacity to your liking -- larger buffer sizes will be faster, but // use more memory NSMutableString *result = [NSMutableString stringWithCapacity:256]; // Read up to 4095 non-newline characters, then read and discard the newline int charsRead; do { if(fscanf(file, "%4095[^\n]%n%*c", buffer, &charsRead) == 1) [result appendFormat:@"%s", buffer]; else break; } while(charsRead == 4095); return result; } 

使用方法如下:

 FILE *file = fopen("myfile", "r"); // check for NULL while(!feof(file)) { NSString *line = readLineAsNSString(file); // do stuff with line; line is autoreleased, so you should NOT release it (unless you also retain it beforehand) } fclose(file); 

该代码从文件中读取非换行符,一次最多可以读取4095个字符。 如果你有一个长度超过4095个字符的行,它会一直读取,直到遇到换行符或文件结束为止。

注意 :我没有testing这个代码。 使用前请先testing一下。

Mac OS X是Unix,Objective-C是C超集,所以你可以使用<stdio.h> old-school fopenfgets 。 这是保证工作。

[NSString stringWithUTF8String:buf]会将Cstring转换为NSString 。 还有其他编码方式创buildstring和创build而不复制的方法。

你可以使用NSInputStream ,它有一个文件stream的基本实现。 您可以将字节读入缓冲区( read:maxLength:方法)。 你必须自己扫描缓冲区换行。

在Apple的String编程指南中logging了在Cocoa / Objective-C中阅读文本文件的适当方法。 读取和写入文件的部分应该就是你所要的。 PS:什么是“线”? 由“\ n”分隔的string的两个部分? 或“\ r”? 或“\ r \ n”? 或者,也许你是在段落之后? 前面提到的指南还包括将string拆分成行或段的章节。 (本节被称为“段落和换行符”,并链接到上面指向的页面的左侧菜单中。不幸的是,本网站不允许发布多个URL,因为我是还不是一个值得信赖的用户。)

为了解释克努特:过早的优化是一切罪恶的根源。 不要简单地假设“将整个文件读入内存”的速度很慢。 你有基准吗? 你知道它实际上将整个文件读入内存吗? 也许它只是返回一个代理对象,并继续阅读幕后你消耗的string? ( 免责声明:我不知道NSString实际上是否可以这样做,可以想象 )。重点是:首先按照logging的方式进行操作。 然后,如果基准testing表明这不具备您所期望的性能,那么就进行优化。

很多这些答案都是长块的代码,或者他们读取整个文件。 我喜欢使用c方法完成这个任务。

 FILE* file = fopen("path to my file", "r"); size_t length; char *cLine = fgetln(file,&length); while (length>0) { char str[length+1]; strncpy(str, cLine, length); str[length] = '\0'; NSString *line = [NSString stringWithFormat:@"%s",str]; % Do what you want here. cLine = fgetln(file,&length); } 

请注意,fgetln不会保留你的换行符。 此外,我们+1的长度,因为我们想为NULL终止空间。

要逐行读取文件(也适用于极端大的文件)可以通过以下function来完成:

 DDFileReader * reader = [[DDFileReader alloc] initWithFilePath:pathToMyFile]; NSString * line = nil; while ((line = [reader readLine])) { NSLog(@"read line: %@", line); } [reader release]; 

要么:

 DDFileReader * reader = [[DDFileReader alloc] initWithFilePath:pathToMyFile]; [reader enumerateLinesUsingBlock:^(NSString * line, BOOL * stop) { NSLog(@"read line: %@", line); }]; [reader release]; 

启用此function的DDFileReader类如下所示:

接口文件(.h):

 @interface DDFileReader : NSObject { NSString * filePath; NSFileHandle * fileHandle; unsigned long long currentOffset; unsigned long long totalFileLength; NSString * lineDelimiter; NSUInteger chunkSize; } @property (nonatomic, copy) NSString * lineDelimiter; @property (nonatomic) NSUInteger chunkSize; - (id) initWithFilePath:(NSString *)aPath; - (NSString *) readLine; - (NSString *) readTrimmedLine; #if NS_BLOCKS_AVAILABLE - (void) enumerateLinesUsingBlock:(void(^)(NSString*, BOOL *))block; #endif @end 

执行(.m)

 #import "DDFileReader.h" @interface NSData (DDAdditions) - (NSRange) rangeOfData_dd:(NSData *)dataToFind; @end @implementation NSData (DDAdditions) - (NSRange) rangeOfData_dd:(NSData *)dataToFind { const void * bytes = [self bytes]; NSUInteger length = [self length]; const void * searchBytes = [dataToFind bytes]; NSUInteger searchLength = [dataToFind length]; NSUInteger searchIndex = 0; NSRange foundRange = {NSNotFound, searchLength}; for (NSUInteger index = 0; index < length; index++) { if (((char *)bytes)[index] == ((char *)searchBytes)[searchIndex]) { //the current character matches if (foundRange.location == NSNotFound) { foundRange.location = index; } searchIndex++; if (searchIndex >= searchLength) { return foundRange; } } else { searchIndex = 0; foundRange.location = NSNotFound; } } return foundRange; } @end @implementation DDFileReader @synthesize lineDelimiter, chunkSize; - (id) initWithFilePath:(NSString *)aPath { if (self = [super init]) { fileHandle = [NSFileHandle fileHandleForReadingAtPath:aPath]; if (fileHandle == nil) { [self release]; return nil; } lineDelimiter = [[NSString alloc] initWithString:@"\n"]; [fileHandle retain]; filePath = [aPath retain]; currentOffset = 0ULL; chunkSize = 10; [fileHandle seekToEndOfFile]; totalFileLength = [fileHandle offsetInFile]; //we don't need to seek back, since readLine will do that. } return self; } - (void) dealloc { [fileHandle closeFile]; [fileHandle release], fileHandle = nil; [filePath release], filePath = nil; [lineDelimiter release], lineDelimiter = nil; currentOffset = 0ULL; [super dealloc]; } - (NSString *) readLine { if (currentOffset >= totalFileLength) { return nil; } NSData * newLineData = [lineDelimiter dataUsingEncoding:NSUTF8StringEncoding]; [fileHandle seekToFileOffset:currentOffset]; NSMutableData * currentData = [[NSMutableData alloc] init]; BOOL shouldReadMore = YES; NSAutoreleasePool * readPool = [[NSAutoreleasePool alloc] init]; while (shouldReadMore) { if (currentOffset >= totalFileLength) { break; } NSData * chunk = [fileHandle readDataOfLength:chunkSize]; NSRange newLineRange = [chunk rangeOfData_dd:newLineData]; if (newLineRange.location != NSNotFound) { //include the length so we can include the delimiter in the string chunk = [chunk subdataWithRange:NSMakeRange(0, newLineRange.location+[newLineData length])]; shouldReadMore = NO; } [currentData appendData:chunk]; currentOffset += [chunk length]; } [readPool release]; NSString * line = [[NSString alloc] initWithData:currentData encoding:NSUTF8StringEncoding]; [currentData release]; return [line autorelease]; } - (NSString *) readTrimmedLine { return [[self readLine] stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]]; } #if NS_BLOCKS_AVAILABLE - (void) enumerateLinesUsingBlock:(void(^)(NSString*, BOOL*))block { NSString * line = nil; BOOL stop = NO; while (stop == NO && (line = [self readLine])) { block(line, &stop); } } #endif @end 

这个课是由Dave DeLong完成的

就像@porneL说的那样,C api非常方便。

 NSString* fileRoot = [[NSBundle mainBundle] pathForResource:@"record" ofType:@"txt"]; FILE *file = fopen([fileRoot UTF8String], "r"); char buffer[256]; while (fgets(buffer, 256, file) != NULL){ NSString* result = [NSString stringWithUTF8String:buffer]; NSLog(@"%@",result); } 

正如其他人已经回答NSInputStream和NSFileHandle都是很好的select,但它也可以用NSData和内存映射相当紧凑的方式完成:

BRLineReader.h

 #import <Foundation/Foundation.h> @interface BRLineReader : NSObject @property (readonly, nonatomic) NSData *data; @property (readonly, nonatomic) NSUInteger linesRead; @property (strong, nonatomic) NSCharacterSet *lineTrimCharacters; @property (readonly, nonatomic) NSStringEncoding stringEncoding; - (instancetype)initWithFile:(NSString *)filePath encoding:(NSStringEncoding)encoding; - (instancetype)initWithData:(NSData *)data encoding:(NSStringEncoding)encoding; - (NSString *)readLine; - (NSString *)readTrimmedLine; - (void)setLineSearchPosition:(NSUInteger)position; @end 

BRLineReader.m

 #import "BRLineReader.h" static unsigned char const BRLineReaderDelimiter = '\n'; @implementation BRLineReader { NSRange _lastRange; } - (instancetype)initWithFile:(NSString *)filePath encoding:(NSStringEncoding)encoding { self = [super init]; if (self) { NSError *error = nil; _data = [NSData dataWithContentsOfFile:filePath options:NSDataReadingMappedAlways error:&error]; if (!_data) { NSLog(@"%@", [error localizedDescription]); } _stringEncoding = encoding; _lineTrimCharacters = [NSCharacterSet whitespaceAndNewlineCharacterSet]; } return self; } - (instancetype)initWithData:(NSData *)data encoding:(NSStringEncoding)encoding { self = [super init]; if (self) { _data = data; _stringEncoding = encoding; _lineTrimCharacters = [NSCharacterSet whitespaceAndNewlineCharacterSet]; } return self; } - (NSString *)readLine { NSUInteger dataLength = [_data length]; NSUInteger beginPos = _lastRange.location + _lastRange.length; NSUInteger endPos = 0; if (beginPos == dataLength) { // End of file return nil; } unsigned char *buffer = (unsigned char *)[_data bytes]; for (NSUInteger i = beginPos; i < dataLength; i++) { endPos = i; if (buffer[i] == BRLineReaderDelimiter) break; } // End of line found _lastRange = NSMakeRange(beginPos, endPos - beginPos + 1); NSData *lineData = [_data subdataWithRange:_lastRange]; NSString *line = [[NSString alloc] initWithData:lineData encoding:_stringEncoding]; _linesRead++; return line; } - (NSString *)readTrimmedLine { return [[self readLine] stringByTrimmingCharactersInSet:_lineTrimCharacters]; } - (void)setLineSearchPosition:(NSUInteger)position { _lastRange = NSMakeRange(position, 0); _linesRead = 0; } @end 

这个答案不是ObjC,而是C.

由于ObjC是基于“C”的,为什么不使用fgets?

是的,我敢肯定,ObjC有它自己的方法 – 我只是不够精通,但知道它是什么:)

从@Adam Rosenfield的答案, fscanf的格式化string将被改变如下:

 "%4095[^\r\n]%n%*[\n\r]" 

它将工作在OSX,Linux,Windows行结束。

使用类别或扩展名,使我们的生活更容易一些。

 extension String { func lines() -> [String] { var lines = [String]() self.enumerateLines { (line, stop) -> () in lines.append(line) } return lines } } // then for line in string.lines() { // do the right thing } 

我发现@lukaswelte和Dave DeLong的代码非常有帮助。 我正在寻找这个问题的解决scheme,但不仅仅是\n而是需要parsing大文件。

如果由多个字符parsing,则写入的代码包含错误。 我已经改变了代码如下。

.h文件:

 #import <Foundation/Foundation.h> @interface FileChunkReader : NSObject { NSString * filePath; NSFileHandle * fileHandle; unsigned long long currentOffset; unsigned long long totalFileLength; NSString * lineDelimiter; NSUInteger chunkSize; } @property (nonatomic, copy) NSString * lineDelimiter; @property (nonatomic) NSUInteger chunkSize; - (id) initWithFilePath:(NSString *)aPath; - (NSString *) readLine; - (NSString *) readTrimmedLine; #if NS_BLOCKS_AVAILABLE - (void) enumerateLinesUsingBlock:(void(^)(NSString*, BOOL *))block; #endif @end 

.m文件:

 #import "FileChunkReader.h" @interface NSData (DDAdditions) - (NSRange) rangeOfData_dd:(NSData *)dataToFind; @end @implementation NSData (DDAdditions) - (NSRange) rangeOfData_dd:(NSData *)dataToFind { const void * bytes = [self bytes]; NSUInteger length = [self length]; const void * searchBytes = [dataToFind bytes]; NSUInteger searchLength = [dataToFind length]; NSUInteger searchIndex = 0; NSRange foundRange = {NSNotFound, searchLength}; for (NSUInteger index = 0; index < length; index++) { if (((char *)bytes)[index] == ((char *)searchBytes)[searchIndex]) { //the current character matches if (foundRange.location == NSNotFound) { foundRange.location = index; } searchIndex++; if (searchIndex >= searchLength) { return foundRange; } } else { searchIndex = 0; foundRange.location = NSNotFound; } } if (foundRange.location != NSNotFound && length < foundRange.location + foundRange.length ) { // if the dataToFind is partially found at the end of [self bytes], // then the loop above would end, and indicate the dataToFind is found // when it only partially was. foundRange.location = NSNotFound; } return foundRange; } @end @implementation FileChunkReader @synthesize lineDelimiter, chunkSize; - (id) initWithFilePath:(NSString *)aPath { if (self = [super init]) { fileHandle = [NSFileHandle fileHandleForReadingAtPath:aPath]; if (fileHandle == nil) { return nil; } lineDelimiter = @"\n"; currentOffset = 0ULL; // ??? chunkSize = 128; [fileHandle seekToEndOfFile]; totalFileLength = [fileHandle offsetInFile]; //we don't need to seek back, since readLine will do that. } return self; } - (void) dealloc { [fileHandle closeFile]; currentOffset = 0ULL; } - (NSString *) readLine { if (currentOffset >= totalFileLength) { return nil; } @autoreleasepool { NSData * newLineData = [lineDelimiter dataUsingEncoding:NSUTF8StringEncoding]; [fileHandle seekToFileOffset:currentOffset]; unsigned long long originalOffset = currentOffset; NSMutableData *currentData = [[NSMutableData alloc] init]; NSData *currentLine = [[NSData alloc] init]; BOOL shouldReadMore = YES; while (shouldReadMore) { if (currentOffset >= totalFileLength) { break; } NSData * chunk = [fileHandle readDataOfLength:chunkSize]; [currentData appendData:chunk]; NSRange newLineRange = [currentData rangeOfData_dd:newLineData]; if (newLineRange.location != NSNotFound) { currentOffset = originalOffset + newLineRange.location + newLineData.length; currentLine = [currentData subdataWithRange:NSMakeRange(0, newLineRange.location)]; shouldReadMore = NO; }else{ currentOffset += [chunk length]; } } if (currentLine.length == 0 && currentData.length > 0) { currentLine = currentData; } return [[NSString alloc] initWithData:currentLine encoding:NSUTF8StringEncoding]; } } - (NSString *) readTrimmedLine { return [[self readLine] stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]]; } #if NS_BLOCKS_AVAILABLE - (void) enumerateLinesUsingBlock:(void(^)(NSString*, BOOL*))block { NSString * line = nil; BOOL stop = NO; while (stop == NO && (line = [self readLine])) { block(line, &stop); } } #endif @end 

这是一个很好的简单的解决scheme,我使用较小的文件:

 NSString *path = [[NSBundle mainBundle] pathForResource:@"Terrain1" ofType:@"txt"]; NSString *contents = [NSString stringWithContentsOfFile:path encoding:NSASCIIStringEncoding error:nil]; NSArray *lines = [contents componentsSeparatedByCharactersInSet:[NSCharacterSet characterSetWithCharactersInString:@"\r\n"]]; for (NSString* line in lines) { if (line.length) { NSLog(@"line: %@", line); } } 

使用这个脚本,它工作得很好:

 NSString *path = @"/Users/xxx/Desktop/names.txt"; NSError *error; NSString *stringFromFileAtPath = [NSString stringWithContentsOfFile: path encoding: NSUTF8StringEncoding error: &error]; if (stringFromFileAtPath == nil) { NSLog(@"Error reading file at %@\n%@", path, [error localizedFailureReason]); } NSLog(@"Contents:%@", stringFromFileAtPath);