Java 8：Streams vs Collections的性能

我是Java 8的新手。我还是不太了解这个API，但是我做了一个小的非正式的基准testing来比较新的Streams API和旧版本的性能。

testing包括过滤Integer列表，并为每个偶数计算平方根并将其存储在Double的结果List中。

这里是代码：

  public static void main(String[] args) { //Calculating square root of even numbers from 1 to N int min = 1; int max = 1000000; List<Integer> sourceList = new ArrayList<>(); for (int i = min; i < max; i++) { sourceList.add(i); } List<Double> result = new LinkedList<>(); //Collections approach long t0 = System.nanoTime(); long elapsed = 0; for (Integer i : sourceList) { if(i % 2 == 0){ result.add(Math.sqrt(i)); } } elapsed = System.nanoTime() - t0; System.out.printf("Collections: Elapsed time:\t %d ns \t(%f seconds)%n", elapsed, elapsed / Math.pow(10, 9)); //Stream approach Stream<Integer> stream = sourceList.stream(); t0 = System.nanoTime(); result = stream.filter(i -> i%2 == 0).map(i -> Math.sqrt(i)).collect(Collectors.toList()); elapsed = System.nanoTime() - t0; System.out.printf("Streams: Elapsed time:\t\t %d ns \t(%f seconds)%n", elapsed, elapsed / Math.pow(10, 9)); //Parallel stream approach stream = sourceList.stream().parallel(); t0 = System.nanoTime(); result = stream.filter(i -> i%2 == 0).map(i -> Math.sqrt(i)).collect(Collectors.toList()); elapsed = System.nanoTime() - t0; System.out.printf("Parallel streams: Elapsed time:\t %d ns \t(%f seconds)%n", elapsed, elapsed / Math.pow(10, 9)); }.

以下是双核机器的结果：

  Collections: Elapsed time: 94338247 ns (0,094338 seconds) Streams: Elapsed time: 201112924 ns (0,201113 seconds) Parallel streams: Elapsed time: 357243629 ns (0,357244 seconds)

对于这个特定的testing，stream的速度是集合的两倍左右，并行性并没有帮助（或者我错误地使用它）。

问题：

这个testing是否公平？我犯了什么错误吗？
stream比集合慢吗？有没有人在这方面做出了很好的正式标准？
我应该争取哪种方法？

更新结果。

在JVM热身（1k次迭代）之后，我按照@pveentjer的build议1k次运行testing：

  Collections: Average time: 206884437,000000 ns (0,206884 seconds) Streams: Average time: 98366725,000000 ns (0,098367 seconds) Parallel streams: Average time: 167703705,000000 ns (0,167704 seconds)

在这种情况下，stream更加高效。我想知道在过滤函数在运行时只被调用一次或两次的应用程序中会出现什么情况。

停止使用LinkedList的任何东西，但使用迭代器从列表中间大量删除。
停止用手写基准代码，使用JMH 。

适当的基准：

 @OutputTimeUnit(TimeUnit.NANOSECONDS) @BenchmarkMode(Mode.AverageTime) @OperationsPerInvocation(StreamVsVanilla.N) public class StreamVsVanilla { public static final int N = 10000; static List<Integer> sourceList = new ArrayList<>(); static { for (int i = 0; i < N; i++) { sourceList.add(i); } } @Benchmark public List<Double> vanilla() { List<Double> result = new ArrayList<>(sourceList.size() / 2 + 1); for (Integer i : sourceList) { if (i % 2 == 0){ result.add(Math.sqrt(i)); } } return result; } @Benchmark public List<Double> stream() { return sourceList.stream() .filter(i -> i % 2 == 0) .map(Math::sqrt) .collect(Collectors.toCollection( () -> new ArrayList<>(sourceList.size() / 2 + 1))); } }

结果：

 Benchmark Mode Samples Mean Mean error Units StreamVsVanilla.stream avgt 10 17.588 0.230 ns/op StreamVsVanilla.vanilla avgt 10 10.796 0.063 ns/op

正如我所期望的，实施stream程相当慢。 JIT能够内联所有lambda的东西，但不会像vanilla版本那样产生完美简洁的代码。

一般来说，Java 8stream不是一种魔术。他们不能加速已经实现的很好的东西（可能是简单的迭代或Java 5的for-each语句replace为Iterable.forEach()和Collection.removeIf()调用）。 stream更多的是关于编码的便利性和安全性。便利 – 快速折衷在这里工作。

1）使用基准testing，你会看到less于1秒的时间。这意味着可能会对您的结果产生强烈的副作用影响。所以，我增加了10次你的任务

  int max = 10000000;

并运行你的基准。我的结果：

 Collections: Elapsed time: 8592999350 ns (8.592999 seconds) Streams: Elapsed time: 2068208058 ns (2.068208 seconds) Parallel streams: Elapsed time: 7186967071 ns (7.186967 seconds)

没有编辑（ int max = 1000000 ）的结果

 Collections: Elapsed time: 113373057 ns (0.113373 seconds) Streams: Elapsed time: 135570440 ns (0.135570 seconds) Parallel streams: Elapsed time: 104091980 ns (0.104092 seconds)

这就像你的结果：stream比收集慢。 结论：很多时间花在stream初始化/值传输上。

2）增加任务stream后变得更快（没关系），但并行stream仍然太慢。怎么了？注意：您已经在您的命令中collect(Collectors.toList()) 。收集到单个集合本质上是在并发执行的情况下引入性能瓶颈和开销。可以通过replace来估计相对的开销成本

 collecting to collection -> counting the element count

对于stream可以通过collect(Collectors.counting())来完成。我得到了结果：

 Collections: Elapsed time: 41856183 ns (0.041856 seconds) Streams: Elapsed time: 546590322 ns (0.546590 seconds) Parallel streams: Elapsed time: 1540051478 ns (1.540051 seconds)

这是一个很大的任务！（ int max = 10000000 ） 结论：收集项目收集大部分时间。最慢的部分是添加到列表。顺便说一句，简单的ArrayList用于Collectors.toList() 。

对于你正在做的事情，我不会使用普通的java api。有很多拳击/拆箱正在进行，所以有一个巨大的性能开销。

我个人认为，许多APIdevise是废话，因为他们创造了很多对象垃圾。

尝试使用double / int的原始数组，并尝试单线程并看看性能是什么。

PS：你可能想看看JMH来照顾基准testing。它需要处理一些典型的陷阱，比如预热JVM。

  public static void main(String[] args) { //Calculating square root of even numbers from 1 to N int min = 1; int max = 10000000; List<Integer> sourceList = new ArrayList<>(); for (int i = min; i < max; i++) { sourceList.add(i); } List<Double> result = new LinkedList<>(); //Collections approach long t0 = System.nanoTime(); long elapsed = 0; for (Integer i : sourceList) { if(i % 2 == 0){ result.add( doSomeCalculate(i)); } } elapsed = System.nanoTime() - t0; System.out.printf("Collections: Elapsed time:\t %d ns \t(%f seconds)%n", elapsed, elapsed / Math.pow(10, 9)); //Stream approach Stream<Integer> stream = sourceList.stream(); t0 = System.nanoTime(); result = stream.filter(i -> i%2 == 0).map(i -> doSomeCalculate(i)) .collect(Collectors.toList()); elapsed = System.nanoTime() - t0; System.out.printf("Streams: Elapsed time:\t\t %d ns \t(%f seconds)%n", elapsed, elapsed / Math.pow(10, 9)); //Parallel stream approach stream = sourceList.stream().parallel(); t0 = System.nanoTime(); result = stream.filter(i -> i%2 == 0).map(i -> doSomeCalculate(i)) .collect(Collectors.toList()); elapsed = System.nanoTime() - t0; System.out.printf("Parallel streams: Elapsed time:\t %d ns \t(%f seconds)%n", elapsed, elapsed / Math.pow(10, 9)); } static double doSomeCalculate(int input) { for(int i=0; i<100000; i++){ Math.sqrt(i+input); } return Math.sqrt(input); }

我稍微改了一下代码，在我有8个内核的mac book pro上跑，我得到了一个合理的结果：

作者：Elapsed time：1522036826 ns（1.522037秒）

stream：已用时间：4315833719 ns（4.315834秒）

平行stream：已用时间：261152901 ns（0.261153秒）

Java 8：Streams vs Collections的性能

低延迟，大规模的消息队列

Swift的性能：map（）和reduce（）vs for循环

在Java中比较两组的最快方法是什么？

Python或OpenCV的C ++编码之间的性能不同吗？

Excel Interop – 效率和性能

Thrift，Protocol Buffers，JSON，EJB等性能比较？

为什么循环通过一个数组比JavaScript本地`indexOf`快得多？

DateTime.DayOfWeek微优化

什么更快，更好地确定在PHP中是否存在数组键？

如果登记册太快了，为什么我们没有更多呢？