Arrays.stream()。map()。sum()

我偶然发现了一个非常简单的在原始数组上映射/简化操作的性能曲线。 这是我的jmh基准代码:

@OutputTimeUnit(TimeUnit.NANOSECONDS) @BenchmarkMode(Mode.AverageTime) @OperationsPerInvocation(Measure.ARRAY_SIZE) @Warmup(iterations = 300, time = 200, timeUnit=MILLISECONDS) @Measurement(iterations = 1, time = 1000, timeUnit=MILLISECONDS) @State(Scope.Thread) @Threads(1) @Fork(1) public class Measure { static final int ARRAY_SIZE = 1<<20; final int[] ds = new int[ARRAY_SIZE]; private IntUnaryOperator mapper; @Setup public void setup() { setAll(ds, i->(int)(Math.random()*(1<<7))); final int multiplier = (int)(Math.random()*10); mapper = d -> multiplier*d; } @Benchmark public double multiply() { return Arrays.stream(ds).map(mapper).sum(); } } 

以下是典型输出的片段:

 # VM invoker: /Library/Java/JavaVirtualMachines/jdk1.8.0_20.jdk/Contents/Home/jre/bin/java # VM options: <none> # Warmup: 300 iterations, 200 ms each # Measurement: 1 iterations, 1000 ms each # Threads: 1 thread, will synchronize iterations # Benchmark mode: Average time, time/op # Benchmark: org.sample.Measure.multiply # Run progress: 0,00% complete, ETA 00:01:01 # Fork: 1 of 1 # Warmup Iteration 1: 0,779 ns/op # Warmup Iteration 2: 0,684 ns/op # Warmup Iteration 3: 0,608 ns/op # Warmup Iteration 4: 0,619 ns/op # Warmup Iteration 5: 0,642 ns/op # Warmup Iteration 6: 0,638 ns/op # Warmup Iteration 7: 0,660 ns/op # Warmup Iteration 8: 0,611 ns/op # Warmup Iteration 9: 0,636 ns/op # Warmup Iteration 10: 0,692 ns/op # Warmup Iteration 11: 0,632 ns/op # Warmup Iteration 12: 0,612 ns/op # Warmup Iteration 13: 1,280 ns/op # Warmup Iteration 14: 7,261 ns/op # Warmup Iteration 15: 7,379 ns/op # Warmup Iteration 16: 7,376 ns/op # Warmup Iteration 17: 7,379 ns/op # Warmup Iteration 18: 7,195 ns/op # Warmup Iteration 19: 7,351 ns/op # Warmup Iteration 20: 7,761 ns/op .... .... .... # Warmup Iteration 100: 7,300 ns/op # Warmup Iteration 101: 7,384 ns/op # Warmup Iteration 102: 7,132 ns/op # Warmup Iteration 103: 7,278 ns/op # Warmup Iteration 104: 7,331 ns/op # Warmup Iteration 105: 7,335 ns/op # Warmup Iteration 106: 7,450 ns/op # Warmup Iteration 107: 7,346 ns/op # Warmup Iteration 108: 7,826 ns/op # Warmup Iteration 109: 7,221 ns/op # Warmup Iteration 110: 8,017 ns/op # Warmup Iteration 111: 7,611 ns/op # Warmup Iteration 112: 7,376 ns/op # Warmup Iteration 113: 0,707 ns/op # Warmup Iteration 114: 0,828 ns/op # Warmup Iteration 115: 0,608 ns/op # Warmup Iteration 116: 0,634 ns/op # Warmup Iteration 117: 0,633 ns/op # Warmup Iteration 118: 0,660 ns/op # Warmup Iteration 119: 0,635 ns/op # Warmup Iteration 120: 0,566 ns/op 

关键时刻发生在迭代13和113:首先性能下降了十倍,然后恢复。 testing运行中相应的时间是2.5秒和22.5秒。 这些事件的时间对arrays大小非常敏感,顺便说一下。

什么可以解释这种行为? JIT编译器可能在第一次迭代中完成了它的工作; 没有GC的说法(通过VisualVM确认)……我对任何一种解释都完全失望。

我的Java版本(OS X):

 $ java -version java version "1.8.0_20" Java(TM) SE Runtime Environment (build 1.8.0_20-b26) Java HotSpot(TM) 64-Bit Server VM (build 25.20-b23, mixed mode) 

JIT将首先编译迭代的热循环,并对数组元素进行操作(map / reduce)。 由于数组包含2个20个元素,所以发生的时间相当早。

之后,JIT编译pipe道,很可能在编译的基准testing方法中内联,由于内联限制无法将其全部编译到一个方法中。 恰好在热循环中达到这些内联限制,并且调用map或sum不是内联的,所以热循环被无意地“去优化”。

在运行基准testing时,使用选项-XX:+UnlockDiagnosticVMOptions -XX:+PrintCompilation -XX:+PrintInlining ,您应该看到以下输出:

  1202 487 % 4 java.util.Spliterators$IntArraySpliterator::forEachRemaining @ 49 (68 bytes) @ 53 java.util.stream.IntPipeline$3$1::accept (23 bytes) inline (hot) \-> TypeProfile (1186714/1186714 counts) = java/util/stream/IntPipeline$3$1 @ 12 test.Measure$$Lambda$2/1745776415::applyAsInt (9 bytes) inline (hot) \-> TypeProfile (1048107/1048107 counts) = test/Measure$$Lambda$2 @ 5 test.Measure::lambda$setup$1 (4 bytes) inline (hot) @ 17 java.util.stream.ReduceOps$5ReducingSink::accept (19 bytes) inline (hot) \-> TypeProfile (1048107/1048107 counts) = java/util/stream/ReduceOps$5ReducingSink @ 10 java.util.stream.IntPipeline$$Lambda$3/1779653790::applyAsInt (6 bytes) inline (hot) \-> TypeProfile (1048064/1048064 counts) = java/util/stream/IntPipeline$$Lambda$3 @ 2 java.lang.Integer::sum (4 bytes) inline (hot) 

这是编译的热循环。 ( %表示它被replace堆栈,或OSR)

后来进一步编译streampipe道发生(我怀疑〜10000次迭代的基准方法,但我没有validation):

  @ 16 java.util.stream.IntPipeline::sum (11 bytes) inline (hot) \-> TypeProfile (5120/5120 counts) = java/util/stream/IntPipeline$3 @ 2 java.lang.invoke.LambdaForm$MH/1279902262::linkToTargetMethod (8 bytes) force inline by annotation @ 4 java.lang.invoke.LambdaForm$MH/1847865997::identity (18 bytes) force inline by annotation @ 14 java.lang.invoke.LambdaForm$DMH/2024969684::invokeStatic_L_L (14 bytes) force inline by annotation @ 1 java.lang.invoke.DirectMethodHandle::internalMemberName (8 bytes) force inline by annotation @ 10 sun.invoke.util.ValueConversions::identity (2 bytes) inline (hot) @ 7 java.util.stream.IntPipeline::reduce (16 bytes) inline (hot) @ 3 java.util.stream.ReduceOps::makeInt (18 bytes) inline (hot) @ 1 java.util.Objects::requireNonNull (14 bytes) inline (hot) @ 14 java.util.stream.ReduceOps$5::<init> (16 bytes) inline (hot) @ 12 java.util.stream.ReduceOps$ReduceOp::<init> (10 bytes) inline (hot) @ 1 java.lang.Object::<init> (1 bytes) inline (hot) @ 6 java.util.stream.AbstractPipeline::evaluate (94 bytes) inline (hot) @ 50 java.util.stream.AbstractPipeline::isParallel (8 bytes) inline (hot) @ 80 java.util.stream.TerminalOp::getOpFlags (2 bytes) inline (hot) \-> TypeProfile (5122/5122 counts) = java/util/stream/ReduceOps$5 @ 85 java.util.stream.AbstractPipeline::sourceSpliterator (163 bytes) inline (hot) @ 79 java.util.stream.AbstractPipeline::isParallel (8 bytes) inline (hot) @ 88 java.util.stream.ReduceOps$ReduceOp::evaluateSequential (18 bytes) inline (hot) @ 2 java.util.stream.ReduceOps$5::makeSink (5 bytes) inline (hot) @ 1 java.util.stream.ReduceOps$5::makeSink (16 bytes) inline (hot) @ 12 java.util.stream.ReduceOps$5ReducingSink::<init> (15 bytes) inline (hot) @ 11 java.lang.Object::<init> (1 bytes) inline (hot) @ 6 java.util.stream.AbstractPipeline::wrapAndCopyInto (18 bytes) inline (hot) @ 3 java.util.Objects::requireNonNull (14 bytes) inline (hot) @ 9 java.util.stream.AbstractPipeline::wrapSink (37 bytes) inline (hot) @ 1 java.util.Objects::requireNonNull (14 bytes) inline (hot) @ 23 java.util.stream.IntPipeline$3::opWrapSink (10 bytes) inline (hot) \-> TypeProfile (4868/4868 counts) = java/util/stream/IntPipeline$3 @ 6 java.util.stream.IntPipeline$3$1::<init> (11 bytes) inline (hot) @ 7 java.util.stream.Sink$ChainedInt::<init> (16 bytes) inline (hot) @ 1 java.lang.Object::<init> (1 bytes) inline (hot) @ 6 java.util.Objects::requireNonNull (14 bytes) inline (hot) @ 13 java.util.stream.AbstractPipeline::copyInto (53 bytes) inline (hot) @ 1 java.util.Objects::requireNonNull (14 bytes) inline (hot) @ 9 java.util.stream.AbstractPipeline::getStreamAndOpFlags (5 bytes) accessor @ 12 java.util.stream.StreamOpFlag::isKnown (19 bytes) inline (hot) @ 20 java.util.Spliterator::getExactSizeIfKnown (25 bytes) inline (hot) \-> TypeProfile (4870/4870 counts) = java/util/Spliterators$IntArraySpliterator @ 1 java.util.Spliterators$IntArraySpliterator::characteristics (5 bytes) accessor @ 19 java.util.Spliterators$IntArraySpliterator::estimateSize (11 bytes) inline (hot) @ 25 java.util.stream.Sink$ChainedInt::begin (11 bytes) inline (hot) \-> TypeProfile (4870/4870 counts) = java/util/stream/IntPipeline$3$1 @ 5 java.util.stream.ReduceOps$5ReducingSink::begin (9 bytes) inline (hot) \-> TypeProfile (4871/4871 counts) = java/util/stream/ReduceOps$5ReducingSink @ 32 java.util.Spliterator$OfInt::forEachRemaining (53 bytes) inline (hot) @ 12 java.util.Spliterators$IntArraySpliterator::forEachRemaining (68 bytes) inline (hot) @ 53 java.util.stream.IntPipeline$3$1::accept (23 bytes) inline (hot) @ 12 test.Measure$$Lambda$2/1745776415::applyAsInt (9 bytes) inline (hot) \-> TypeProfile (1048107/1048107 counts) = test/Measure$$Lambda$2 @ 5 test.Measure::lambda$setup$1 (4 bytes) inlining too deep @ 17 java.util.stream.ReduceOps$5ReducingSink::accept (19 bytes) inline (hot) \-> TypeProfile (1048107/1048107 counts) = java/util/stream/ReduceOps$5ReducingSink @ 10 java.util.stream.IntPipeline$$Lambda$3/1779653790::applyAsInt (6 bytes) inlining too deep \-> TypeProfile (1048064/1048064 counts) = java/util/stream/IntPipeline$$Lambda$3 @ 53 java.util.stream.IntPipeline$3$1::accept (23 bytes) inline (hot) @ 12 test.Measure$$Lambda$2/1745776415::applyAsInt (9 bytes) inline (hot) \-> TypeProfile (1048107/1048107 counts) = test/Measure$$Lambda$2 @ 5 test.Measure::lambda$setup$1 (4 bytes) inlining too deep @ 17 java.util.stream.ReduceOps$5ReducingSink::accept (19 bytes) inline (hot) \-> TypeProfile (1048107/1048107 counts) = java/util/stream/ReduceOps$5ReducingSink @ 10 java.util.stream.IntPipeline$$Lambda$3/1779653790::applyAsInt (6 bytes) inlining too deep \-> TypeProfile (1048064/1048064 counts) = java/util/stream/IntPipeline$$Lambda$3 @ 38 java.util.stream.Sink$ChainedInt::end (10 bytes) inline (hot) @ 4 java.util.stream.Sink::end (1 bytes) inline (hot) \-> TypeProfile (5120/5120 counts) = java/util/stream/ReduceOps$5ReducingSink @ 12 java.util.stream.ReduceOps$5ReducingSink::get (5 bytes) inline (hot) @ 1 java.util.stream.ReduceOps$5ReducingSink::get (8 bytes) inline (hot) @ 4 java.lang.Integer::valueOf (32 bytes) inline (hot) @ 28 java.lang.Integer::<init> (10 bytes) inline (hot) @ 1 java.lang.Number::<init> (5 bytes) inline (hot) @ 1 java.lang.Object::<init> (1 bytes) inline (hot) @ 12 java.lang.Integer::intValue (5 bytes) accessor 

注意热循环中的方法出现的“内联太深”。

即使在以后生成的JMH测量循环被编译:

  26857 685 3 test.generated.Measure_multiply::multiply_avgt_jmhLoop (55 bytes) @ 7 java.lang.System::nanoTime (0 bytes) intrinsic @ 16 test.Measure::multiply (23 bytes) @ 4 java.util.Arrays::stream (8 bytes) @ 4 java.util.Arrays::stream (11 bytes) @ 3 java.util.Arrays::spliterator (10 bytes) @ 6 java.util.Spliterators::spliterator (25 bytes) callee is too large @ 7 java.util.stream.StreamSupport::intStream (14 bytes) @ 6 java.util.stream.StreamOpFlag::fromCharacteristics (37 bytes) callee is too large @ 10 java.util.stream.IntPipeline$Head::<init> (8 bytes) @ 4 java.util.stream.IntPipeline::<init> (8 bytes) @ 4 java.util.stream.AbstractPipeline::<init> (55 bytes) callee is too large @ 11 java.util.stream.IntPipeline::map (26 bytes) @ 1 java.util.Objects::requireNonNull (14 bytes) @ 8 java.lang.NullPointerException::<init> (5 bytes) don't inline Throwable constructors @ 22 java.util.stream.IntPipeline$3::<init> (20 bytes) @ 16 java.util.stream.IntPipeline$StatelessOp::<init> (29 bytes) callee is too large @ 16 java.util.stream.IntPipeline::sum (11 bytes) @ 2 java.lang.invoke.LambdaForm$MH/1279902262::linkToTargetMethod (8 bytes) force inline by annotation @ 4 java.lang.invoke.LambdaForm$MH/1847865997::identity (18 bytes) force inline by annotation @ 14 java.lang.invoke.LambdaForm$DMH/2024969684::invokeStatic_L_L (14 bytes) force inline by annotation @ 1 java.lang.invoke.DirectMethodHandle::internalMemberName (8 bytes) force inline by annotation @ 10 sun.invoke.util.ValueConversions::identity (2 bytes) @ 7 java.util.stream.IntPipeline::reduce (16 bytes) @ 3 java.util.stream.ReduceOps::makeInt (18 bytes) @ 1 java.util.Objects::requireNonNull (14 bytes) @ 14 java.util.stream.ReduceOps$5::<init> (16 bytes) @ 12 java.util.stream.ReduceOps$ReduceOp::<init> (10 bytes) @ 1 java.lang.Object::<init> (1 bytes) @ 6 java.util.stream.AbstractPipeline::evaluate (94 bytes) callee is too large @ 12 java.lang.Integer::intValue (5 bytes) 

请注意,没有尝试内联整个streampipe线,它在到达热循环之前停止,请参阅“被调用者太大”,从而重新优化热循环。

内联限制可以增加以避免这种行为,例如-XX:MaxInlineLevel=12