Java中文件的行数

我使用巨大的数据文件,有时我只需要知道这些文件中的行数,通常我打开它们,逐行阅读,直到我到达文件的末尾

我想知道是否有一个更聪明的方法来做到这一点

这是迄今为止发现的最快的版本,比readLines快了约6倍。 在一个150MB的日志文件上,这需要0.35秒,而使用readLines()时为2.40秒。 只是为了好玩,linux'wc -l命令需要0.15秒。

public static int countLines(String filename) throws IOException { InputStream is = new BufferedInputStream(new FileInputStream(filename)); try { byte[] c = new byte[1024]; int count = 0; int readChars = 0; boolean empty = true; while ((readChars = is.read(c)) != -1) { empty = false; for (int i = 0; i < readChars; ++i) { if (c[i] == '\n') { ++count; } } } return (count == 0 && !empty) ? 1 : count; } finally { is.close(); } } 

我已经实现了另一个解决scheme,我发现它更有效的计数行:

 try ( FileReader input = new FileReader("input.txt"); LineNumberReader count = new LineNumberReader(input); ) { while (count.skip(Long.MAX_VALUE) > 0) { // Loop just in case the file is > Long.MAX_VALUE or skip() decides to not read the entire file } result = count.getLineNumber() + 1; // +1 because line index starts at 0 } 

对于不以换行符结尾的多行文件,接受的答案有一个错误。 没有换行结尾的一行文件将返回1,但是结尾没有换行符的两行文件也会返回1。 这是一个解决这个问题的解决scheme的实现。 在没有NewLine检查的情况下,除了最终读取之外,对于所有内容都是浪费的,但与整体function相比,应该是微不足道的。

 public int count(String filename) throws IOException { InputStream is = new BufferedInputStream(new FileInputStream(filename)); try { byte[] c = new byte[1024]; int count = 0; int readChars = 0; boolean endsWithoutNewLine = false; while ((readChars = is.read(c)) != -1) { for (int i = 0; i < readChars; ++i) { if (c[i] == '\n') ++count; } endsWithoutNewLine = (c[readChars - 1] != '\n'); } if(endsWithoutNewLine) { ++count; } return count; } finally { is.close(); } } 

用java-8 ,你可以使用stream:

 try (Stream<String> lines = Files.lines(path, Charset.defaultCharset())) { long numOfLines = lines.count(); ... } 

上面的方法count()的答案给了我行错误,如果一个文件没有在文件的末尾换行 – 它没有计算文件的最后一行。

这种方法对我更好:

 public int countLines(String filename) throws IOException { LineNumberReader reader = new LineNumberReader(new FileReader(filename)); int cnt = 0; String lineRead = ""; while ((lineRead = reader.readLine()) != null) {} cnt = reader.getLineNumber(); reader.close(); return cnt; } 

我知道这是一个古老的问题,但接受的解决scheme并不完全符合我所需要的解决scheme。 因此,我将其细化为接受各种行结束符(而不是换行),并使用指定的字符编码(而不是ISO-8859- n )。 所有在一个方法(适当重构):

 public static long getLinesCount(String fileName, String encodingName) throws IOException { long linesCount = 0; File file = new File(fileName); FileInputStream fileIn = new FileInputStream(file); try { Charset encoding = Charset.forName(encodingName); Reader fileReader = new InputStreamReader(fileIn, encoding); int bufferSize = 4096; Reader reader = new BufferedReader(fileReader, bufferSize); char[] buffer = new char[bufferSize]; int prevChar = -1; int readCount = reader.read(buffer); while (readCount != -1) { for (int i = 0; i < readCount; i++) { int nextChar = buffer[i]; switch (nextChar) { case '\r': { // The current line is terminated by a carriage return or by a carriage return immediately followed by a line feed. linesCount++; break; } case '\n': { if (prevChar == '\r') { // The current line is terminated by a carriage return immediately followed by a line feed. // The line has already been counted. } else { // The current line is terminated by a line feed. linesCount++; } break; } } prevChar = nextChar; } readCount = reader.read(buffer); } if (prevCh != -1) { switch (prevCh) { case '\r': case '\n': { // The last line is terminated by a line terminator. // The last line has already been counted. break; } default: { // The last line is terminated by end-of-file. linesCount++; } } } } finally { fileIn.close(); } return linesCount; } 

这个解决scheme在速度上可以与公认的解决scheme相媲美,在我的testing中速度降低了大约4%(尽pipeJava中的时序testing是非常不可靠的)。

我得出结论, wc -l :s计算换行符的方法很好,但是在最后一行不以换行符结束的文件上返回非直观的结果。

而基于LineNumberReader的@ er.vikas解决scheme,但在行数加1后,返回的文件最后一行是以换行符结尾的非直观结果。

因此我做了一个algorithm如下:

 @Test public void empty() throws IOException { assertEquals(0, count("")); } @Test public void singleNewline() throws IOException { assertEquals(1, count("\n")); } @Test public void dataWithoutNewline() throws IOException { assertEquals(1, count("one")); } @Test public void oneCompleteLine() throws IOException { assertEquals(1, count("one\n")); } @Test public void twoCompleteLines() throws IOException { assertEquals(2, count("one\ntwo\n")); } @Test public void twoLinesWithoutNewlineAtEnd() throws IOException { assertEquals(2, count("one\ntwo")); } @Test public void aFewLines() throws IOException { assertEquals(5, count("one\ntwo\nthree\nfour\nfive\n")); } 

它看起来像这样:

 static long countLines(InputStream is) throws IOException { try(LineNumberReader lnr = new LineNumberReader(new InputStreamReader(is))) { char[] buf = new char[8192]; int n, previousN = -1; //Read will return at least one byte, no need to buffer more while((n = lnr.read(buf)) != -1) { previousN = n; } int ln = lnr.getLineNumber(); if (previousN == -1) { //No data read at all, ie file was empty return 0; } else { char lastChar = buf[previousN - 1]; if (lastChar == '\n' || lastChar == '\r') { //Ending with newline, deduct one return ln; } } //normal case, return line number + 1 return ln + 1; } } 

如果你想要直观的结果,你可以使用这个。 如果你只是想兼容wc -l ,简单的使用@er.vikas解决scheme,但是不要在结果中加一个,然后重试跳过:

 try(LineNumberReader lnr = new LineNumberReader(new FileReader(new File("File1")))) { while(lnr.skip(Long.MAX_VALUE) > 0){}; return lnr.getLineNumber(); } 

从Java代码中使用Process类怎么样? 然后读取命令的输出。

 Process p = Runtime.getRuntime().exec("wc -l " + yourfilename); p.waitFor(); BufferedReader b = new BufferedReader(new InputStreamReader(p.getInputStream())); String line = ""; int lineCount = 0; while ((line = b.readLine()) != null) { System.out.println(line); lineCount = Integer.parseInt(line); } 

需要尝试一下。 将张贴结果。

使用扫描仪的直接方法

 static void lineCounter (String path) throws IOException { int lineCount = 0, commentsCount = 0; Scanner input = new Scanner(new File(path)); while (input.hasNextLine()) { String data = input.nextLine(); if (data.startsWith("//")) commentsCount++; lineCount++; } System.out.println("Line Count: " + lineCount + "\t Comments Count: " + commentsCount); } 

在基于Unix的系统上,在命令行上使用wc命令。

只有知道文件中有多less行的方法是对它们进行计数。 您当然可以根据您的数据创build一个指标,给出您平均一行的长度,然后获取文件大小并将其与平均值相除。 长度但这不会是准确的。

如果你没有任何索引结构,你不会阅读整个文件。 但是,您可以通过避免逐行阅读并使用正则expression式来匹配所有行终止符来优化它。

 /** * Count file rows. * * @param file file * @return file row count * @throws IOException */ public static long getLineCount(File file) throws IOException { try (Stream<String> lines = Files.lines(file.toPath())) { return lines.count(); } } 

在JDK8_u31上testing 但是,与这种方法相比,性能确实很慢:

 /** * Count file rows. * * @param file file * @return file row count * @throws IOException */ public static long getLineCount(File file) throws IOException { try (BufferedInputStream is = new BufferedInputStream(new FileInputStream(file), 1024)) { byte[] c = new byte[1024]; boolean empty = true, lastEmpty = false; long count = 0; int read; while ((read = is.read(c)) != -1) { for (int i = 0; i < read; i++) { if (c[i] == '\n') { count++; lastEmpty = true; } else if (lastEmpty) { lastEmpty = false; } } empty = false; } if (!empty) { if (count == 0) { count = 1; } else if (!lastEmpty) { count++; } } return count; } } 

经过testing,速度非常快。

这个有趣的解决scheme其实真的很好!

 public static int countLines(File input) throws IOException { try (InputStream is = new FileInputStream(input)) { int count = 1; for (int aChar = 0; aChar != -1;aChar = is.read()) count += aChar == '\n' ? 1 : 0; return count; } } 

在EOF中没有换行('\ n')字符的最佳优化代码。

 /** * * @param filename * @return * @throws IOException */ public static int countLines(String filename) throws IOException { int count = 0; boolean empty = true; FileInputStream fis = null; InputStream is = null; try { fis = new FileInputStream(filename); is = new BufferedInputStream(fis); byte[] c = new byte[1024]; int readChars = 0; boolean isLine = false; while ((readChars = is.read(c)) != -1) { empty = false; for (int i = 0; i < readChars; ++i) { if ( c[i] == '\n' ) { isLine = false; ++count; }else if(!isLine && c[i] != '\n' && c[i] != '\r'){ //Case to handle line count where no New Line character present at EOF isLine = true; } } } if(isLine){ ++count; } }catch(IOException e){ e.printStackTrace(); }finally { if(is != null){ is.close(); } if(fis != null){ fis.close(); } } LOG.info("count: "+count); return (count == 0 && !empty) ? 1 : count; } 

如果你使用这个

 public int countLines(String filename) throws IOException { LineNumberReader reader = new LineNumberReader(new FileReader(filename)); int cnt = 0; String lineRead = ""; while ((lineRead = reader.readLine()) != null) {} cnt = reader.getLineNumber(); reader.close(); return cnt; } 

你不能运行到大数行,喜欢100K行,因为从reader.getLineNumber返回是int。 您需要较长的数据types来处理最大的行数