在Java文本文件中写入大量数据的最快方法

我必须在文本[csv]文件中写入大量的数据。 我使用BufferedWriter来写入数据,写入174 MB数据需要大约40秒的时间。 这是Java可以提供的最快速度吗?

bufferedWriter = new BufferedWriter ( new FileWriter ( "fileName.csv" ) ); 

注意:这40秒包括迭代和从结果集中获取logging的时间。 :)。 结果集中174 MB是400000行。

您可以尝试删除BufferedWriter,并直接使用FileWriter。 在现代系统中,无论如何你都只需要写驱动器的caching就可以了。

这需要我花费4-5秒的时间来写入175MB(400万string) – 这是一个运行Windows XP的双核2.4GHz戴尔,带有一个80GB,7200转的日立磁盘。

你能隔离多less时间是logging检索和文件写入多less?

 import java.io.BufferedWriter; import java.io.File; import java.io.FileWriter; import java.io.IOException; import java.io.Writer; import java.util.ArrayList; import java.util.List; public class FileWritingPerfTest { private static final int ITERATIONS = 5; private static final double MEG = (Math.pow(1024, 2)); private static final int RECORD_COUNT = 4000000; private static final String RECORD = "Help I am trapped in a fortune cookie factory\n"; private static final int RECSIZE = RECORD.getBytes().length; public static void main(String[] args) throws Exception { List<String> records = new ArrayList<String>(RECORD_COUNT); int size = 0; for (int i = 0; i < RECORD_COUNT; i++) { records.add(RECORD); size += RECSIZE; } System.out.println(records.size() + " 'records'"); System.out.println(size / MEG + " MB"); for (int i = 0; i < ITERATIONS; i++) { System.out.println("\nIteration " + i); writeRaw(records); writeBuffered(records, 8192); writeBuffered(records, (int) MEG); writeBuffered(records, 4 * (int) MEG); } } private static void writeRaw(List<String> records) throws IOException { File file = File.createTempFile("foo", ".txt"); try { FileWriter writer = new FileWriter(file); System.out.print("Writing raw... "); write(records, writer); } finally { // comment this out if you want to inspect the files afterward file.delete(); } } private static void writeBuffered(List<String> records, int bufSize) throws IOException { File file = File.createTempFile("foo", ".txt"); try { FileWriter writer = new FileWriter(file); BufferedWriter bufferedWriter = new BufferedWriter(writer, bufSize); System.out.print("Writing buffered (buffer size: " + bufSize + ")... "); write(records, bufferedWriter); } finally { // comment this out if you want to inspect the files afterward file.delete(); } } private static void write(List<String> records, Writer writer) throws IOException { long start = System.currentTimeMillis(); for (String record: records) { writer.write(record); } writer.flush(); writer.close(); long end = System.currentTimeMillis(); System.out.println((end - start) / 1000f + " seconds"); } } 

尝试内存映射文件(在我的m / c,核心2双核,2.5GB RAM中需要300 m / s写174MB):

 byte[] buffer = "Help I am trapped in a fortune cookie factory\n".getBytes(); int number_of_lines = 400000; FileChannel rwChannel = new RandomAccessFile("textfile.txt", "rw").getChannel(); ByteBuffer wrBuf = rwChannel.map(FileChannel.MapMode.READ_WRITE, 0, buffer.length * number_of_lines); for (int i = 0; i < number_of_lines; i++) { wrBuf.put(buffer); } rwChannel.close(); 

仅为了统计:

这台机器是旧的戴尔新的固态硬盘

CPU:Intel Pentium D 2,8 Ghz

SSD:爱国者地狱120GB SSD

 4000000 'records' 175.47607421875 MB Iteration 0 Writing raw... 3.547 seconds Writing buffered (buffer size: 8192)... 2.625 seconds Writing buffered (buffer size: 1048576)... 2.203 seconds Writing buffered (buffer size: 4194304)... 2.312 seconds Iteration 1 Writing raw... 2.922 seconds Writing buffered (buffer size: 8192)... 2.406 seconds Writing buffered (buffer size: 1048576)... 2.015 seconds Writing buffered (buffer size: 4194304)... 2.282 seconds Iteration 2 Writing raw... 2.828 seconds Writing buffered (buffer size: 8192)... 2.109 seconds Writing buffered (buffer size: 1048576)... 2.078 seconds Writing buffered (buffer size: 4194304)... 2.015 seconds Iteration 3 Writing raw... 3.187 seconds Writing buffered (buffer size: 8192)... 2.109 seconds Writing buffered (buffer size: 1048576)... 2.094 seconds Writing buffered (buffer size: 4194304)... 2.031 seconds Iteration 4 Writing raw... 3.093 seconds Writing buffered (buffer size: 8192)... 2.141 seconds Writing buffered (buffer size: 1048576)... 2.063 seconds Writing buffered (buffer size: 4194304)... 2.016 seconds 

正如我们可以看到,原始的方法是缓慢的缓冲。

您的传输速度可能不会受到Java的限制。 相反,我会怀疑(没有特定的顺序)

  1. 从数据库传输的速度
  2. 传输到磁盘的速度

如果读取完整的数据集并将其写入磁盘,那么这将花费更长的时间,因为JVM将不得不分配内存,并且db rea / disk写入将按顺序进行。 相反,我会写出缓冲的作家,你从数据库做的每一个读,所以这个操作将更接近并发的(我不知道你是否这样做)

对于这些庞大的数据库读取,您可能需要调整您的语句的读取大小 。 这可能会节省很多往返数据库。

http://download.oracle.com/javase/1.5.0/docs/api/java/sql/Statement.html#setFetchSize%28int%29

 package all.is.well; import java.io.IOException; import java.io.RandomAccessFile; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; import junit.framework.TestCase; /** * @author Naresh Bhabat * Following implementation helps to deal with extra large files in java. This program is tested for dealing with 2GB input file. There are some points where extra logic can be added in future. Pleasenote: if we want to deal with binary input file, then instead of reading line,we need to read bytes from read file object. It uses random access file,which is almost like streaming API. * **************************************** Notes regarding executor framework and its readings. Please note :ExecutorService executor = Executors.newFixedThreadPool(10); * for 10 threads:Total time required for reading and writing the text in * :seconds 349.317 * * For 100:Total time required for reading the text and writing : seconds 464.042 * * For 1000 : Total time required for reading and writing text :466.538 * For 10000 Total time required for reading and writing in seconds 479.701 * * */ public class DealWithHugeRecordsinFile extends TestCase { static final String FILEPATH = "C:\\springbatch\\bigfile1.txt.txt"; static final String FILEPATH_WRITE = "C:\\springbatch\\writinghere.txt"; static volatile RandomAccessFile fileToWrite; static volatile RandomAccessFile file; static volatile String fileContentsIter; static volatile int position = 0; public static void main(String[] args) throws IOException, InterruptedException { long currentTimeMillis = System.currentTimeMillis(); try { fileToWrite = new RandomAccessFile(FILEPATH_WRITE, "rw");//for random write,independent of thread obstacles file = new RandomAccessFile(FILEPATH, "r");//for random read,independent of thread obstacles seriouslyReadProcessAndWriteAsynch(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } Thread currentThread = Thread.currentThread(); System.out.println(currentThread.getName()); long currentTimeMillis2 = System.currentTimeMillis(); double time_seconds = (currentTimeMillis2 - currentTimeMillis) / 1000.0; System.out.println("Total time required for reading the text in seconds " + time_seconds); } /** * @throws IOException * Something asynchronously serious */ public static void seriouslyReadProcessAndWriteAsynch() throws IOException { ExecutorService executor = Executors.newFixedThreadPool(10);//pls see for explanation in comments section of the class while (true) { String readLine = file.readLine(); if (readLine == null) { break; } Runnable genuineWorker = new Runnable() { @Override public void run() { // do hard processing here in this thread,i have consumed // some time and eat some exception in write method. writeToFile(FILEPATH_WRITE, readLine); // System.out.println(" :" + // Thread.currentThread().getName()); } }; executor.execute(genuineWorker); } executor.shutdown(); while (!executor.isTerminated()) { } System.out.println("Finished all threads"); file.close(); fileToWrite.close(); } /** * @param filePath * @param data * @param position */ private static void writeToFile(String filePath, String data) { try { // fileToWrite.seek(position); data = "\n" + data; if (!data.contains("Randomization")) { return; } System.out.println("Let us do something time consuming to make this thread busy"+(position++) + " :" + data); System.out.println("Lets consume through this loop"); int i=1000; while(i>0){ i--; } fileToWrite.write(data.getBytes()); throw new Exception(); } catch (Exception exception) { System.out.println("exception was thrown but still we are able to proceeed further" + " \n This can be used for marking failure of the records"); //exception.printStackTrace(); } } }