为什么操作员比方法调用慢得多? (结构只在较早的JIT上变慢)

简介:我用C#编写高性能的代码。 是的,我知道C ++会给我更好的优化,但我仍然select使用C#。 我不想辩论这个select。 相反,我想听听那些和我一样,试图在.NET Framework上编写高性能代码的人。

问题:

  • 为什么下面的代码中的运算符比等效的方法调用慢?
  • 为什么在下面的代码中传递两个双精度的方法比传递一个内部有两个双精度的结构的等价方法快呢? (答:老JITs优化结构不佳)
  • 有没有办法让.NET JIT编译器像结构体的成员一样有效地对待简单结构体? (答:得到更新的JIT)

我想我知道:原始的.NET JIT编译器不会内联任何涉及结构的东西。 奇怪的给定的结构应该只用于你需要小的值types,应该像内置优化,但是真实的。 幸运的是,在.NET 3.5SP1和.NET 2.0SP2中,他们对JIT Optimizer进行了一些改进,包括对内联的改进,特别是对结构的改进。 (我猜他们是这样做的,否则他们所引入的新复杂结构将会执行非常糟糕的事情……所以复杂团队可能正在冲击JIT Optimizer团队。)因此,.NET 3.5 SP1之前的任何文档可能这个问题不太相关。

我的testing显示:我已经通过检查C:\ Windows \ Microsoft.NET \ Framework \ v2.0.50727 \ mscorwks.dll文件是否具有> = 3053版本来validation我是否具有较新的JIT Optimizer,因此应该有这些改进到JIT优化器。 但是,即使如此,我的时间表和反汇编看起来都是:

JIT生成的用于传递两个双精度结构的代码比直接传递两个双精度的代码效率低得多。

与传递结构作为参数相比,JIT为struct方法生成的代码传递“this”的效率要高得多。

如果你传递两个双打而不是传递一个带有两个双精度结构的JIT,则JIT仍然更好,即使乘法器由于清楚地处于一个循环中也是如此。

定时:实际上,在反汇编中,我意识到循环中的大部分时间只是从列表中访问testing数据。 如果将循环的开销代码和数据的访问分解出来,那么进行相同调用的四种方法之间的区别是显着不同的。 我可以从5倍到20倍的任何地方做PlusEqual(double,double)而不是PlusEqual(Element)。 和10倍到40倍做PlusEqual(double,double)而不是operator + =。 哇。 伤心。

以下是一组计时:

Populating List<Element> took 320ms. The PlusEqual() method took 105ms. The 'same' += operator took 131ms. The 'same' -= operator took 139ms. The PlusEqual(double, double) method took 68ms. The do nothing loop took 66ms. The ratio of operator with constructor to method is 124%. The ratio of operator without constructor to method is 132%. The ratio of PlusEqual(double,double) to PlusEqual(Element) is 64%. If we remove the overhead time for the loop accessing the elements from the List... The ratio of operator with constructor to method is 166%. The ratio of operator without constructor to method is 187%. The ratio of PlusEqual(double,double) to PlusEqual(Element) is 5%. 

代码:

 namespace OperatorVsMethod { public struct Element { public double Left; public double Right; public Element(double left, double right) { this.Left = left; this.Right = right; } public static Element operator +(Element x, Element y) { return new Element(x.Left + y.Left, x.Right + y.Right); } public static Element operator -(Element x, Element y) { x.Left += y.Left; x.Right += y.Right; return x; } /// <summary> /// Like the += operator; but faster. /// </summary> public void PlusEqual(Element that) { this.Left += that.Left; this.Right += that.Right; } /// <summary> /// Like the += operator; but faster. /// </summary> public void PlusEqual(double thatLeft, double thatRight) { this.Left += thatLeft; this.Right += thatRight; } } [TestClass] public class UnitTest1 { [TestMethod] public void TestMethod1() { Stopwatch stopwatch = new Stopwatch(); // Populate a List of Elements to multiply together int seedSize = 4; List<double> doubles = new List<double>(seedSize); doubles.Add(2.5d); doubles.Add(100000d); doubles.Add(-0.5d); doubles.Add(-100002d); int size = 2500000 * seedSize; List<Element> elts = new List<Element>(size); stopwatch.Reset(); stopwatch.Start(); for (int ii = 0; ii < size; ++ii) { int di = ii % seedSize; double d = doubles[di]; elts.Add(new Element(d, d)); } stopwatch.Stop(); long populateMS = stopwatch.ElapsedMilliseconds; // Measure speed of += operator (calls ctor) Element operatorCtorResult = new Element(1d, 1d); stopwatch.Reset(); stopwatch.Start(); for (int ii = 0; ii < size; ++ii) { operatorCtorResult += elts[ii]; } stopwatch.Stop(); long operatorCtorMS = stopwatch.ElapsedMilliseconds; // Measure speed of -= operator (+= without ctor) Element operatorNoCtorResult = new Element(1d, 1d); stopwatch.Reset(); stopwatch.Start(); for (int ii = 0; ii < size; ++ii) { operatorNoCtorResult -= elts[ii]; } stopwatch.Stop(); long operatorNoCtorMS = stopwatch.ElapsedMilliseconds; // Measure speed of PlusEqual(Element) method Element plusEqualResult = new Element(1d, 1d); stopwatch.Reset(); stopwatch.Start(); for (int ii = 0; ii < size; ++ii) { plusEqualResult.PlusEqual(elts[ii]); } stopwatch.Stop(); long plusEqualMS = stopwatch.ElapsedMilliseconds; // Measure speed of PlusEqual(double, double) method Element plusEqualDDResult = new Element(1d, 1d); stopwatch.Reset(); stopwatch.Start(); for (int ii = 0; ii < size; ++ii) { Element elt = elts[ii]; plusEqualDDResult.PlusEqual(elt.Left, elt.Right); } stopwatch.Stop(); long plusEqualDDMS = stopwatch.ElapsedMilliseconds; // Measure speed of doing nothing but accessing the Element Element doNothingResult = new Element(1d, 1d); stopwatch.Reset(); stopwatch.Start(); for (int ii = 0; ii < size; ++ii) { Element elt = elts[ii]; double left = elt.Left; double right = elt.Right; } stopwatch.Stop(); long doNothingMS = stopwatch.ElapsedMilliseconds; // Report results Assert.AreEqual(1d, operatorCtorResult.Left, "The operator += did not compute the right result!"); Assert.AreEqual(1d, operatorNoCtorResult.Left, "The operator += did not compute the right result!"); Assert.AreEqual(1d, plusEqualResult.Left, "The operator += did not compute the right result!"); Assert.AreEqual(1d, plusEqualDDResult.Left, "The operator += did not compute the right result!"); Assert.AreEqual(1d, doNothingResult.Left, "The operator += did not compute the right result!"); // Report speeds Console.WriteLine("Populating List<Element> took {0}ms.", populateMS); Console.WriteLine("The PlusEqual() method took {0}ms.", plusEqualMS); Console.WriteLine("The 'same' += operator took {0}ms.", operatorCtorMS); Console.WriteLine("The 'same' -= operator took {0}ms.", operatorNoCtorMS); Console.WriteLine("The PlusEqual(double, double) method took {0}ms.", plusEqualDDMS); Console.WriteLine("The do nothing loop took {0}ms.", doNothingMS); // Compare speeds long percentageRatio = 100L * operatorCtorMS / plusEqualMS; Console.WriteLine("The ratio of operator with constructor to method is {0}%.", percentageRatio); percentageRatio = 100L * operatorNoCtorMS / plusEqualMS; Console.WriteLine("The ratio of operator without constructor to method is {0}%.", percentageRatio); percentageRatio = 100L * plusEqualDDMS / plusEqualMS; Console.WriteLine("The ratio of PlusEqual(double,double) to PlusEqual(Element) is {0}%.", percentageRatio); operatorCtorMS -= doNothingMS; operatorNoCtorMS -= doNothingMS; plusEqualMS -= doNothingMS; plusEqualDDMS -= doNothingMS; Console.WriteLine("If we remove the overhead time for the loop accessing the elements from the List..."); percentageRatio = 100L * operatorCtorMS / plusEqualMS; Console.WriteLine("The ratio of operator with constructor to method is {0}%.", percentageRatio); percentageRatio = 100L * operatorNoCtorMS / plusEqualMS; Console.WriteLine("The ratio of operator without constructor to method is {0}%.", percentageRatio); percentageRatio = 100L * plusEqualDDMS / plusEqualMS; Console.WriteLine("The ratio of PlusEqual(double,double) to PlusEqual(Element) is {0}%.", percentageRatio); } } } 

IL:(也就是上面的一些东西被编译成)

 public void PlusEqual(Element that) { 00000000 push ebp 00000001 mov ebp,esp 00000003 push edi 00000004 push esi 00000005 push ebx 00000006 sub esp,30h 00000009 xor eax,eax 0000000b mov dword ptr [ebp-10h],eax 0000000e xor eax,eax 00000010 mov dword ptr [ebp-1Ch],eax 00000013 mov dword ptr [ebp-3Ch],ecx 00000016 cmp dword ptr ds:[04C87B7Ch],0 0000001d je 00000024 0000001f call 753081B1 00000024 nop this.Left += that.Left; 00000025 mov eax,dword ptr [ebp-3Ch] 00000028 fld qword ptr [ebp+8] 0000002b fadd qword ptr [eax] 0000002d fstp qword ptr [eax] this.Right += that.Right; 0000002f mov eax,dword ptr [ebp-3Ch] 00000032 fld qword ptr [ebp+10h] 00000035 fadd qword ptr [eax+8] 00000038 fstp qword ptr [eax+8] } 0000003b nop 0000003c lea esp,[ebp-0Ch] 0000003f pop ebx 00000040 pop esi 00000041 pop edi 00000042 pop ebp 00000043 ret 10h public void PlusEqual(double thatLeft, double thatRight) { 00000000 push ebp 00000001 mov ebp,esp 00000003 push edi 00000004 push esi 00000005 push ebx 00000006 sub esp,30h 00000009 xor eax,eax 0000000b mov dword ptr [ebp-10h],eax 0000000e xor eax,eax 00000010 mov dword ptr [ebp-1Ch],eax 00000013 mov dword ptr [ebp-3Ch],ecx 00000016 cmp dword ptr ds:[04C87B7Ch],0 0000001d je 00000024 0000001f call 75308159 00000024 nop this.Left += thatLeft; 00000025 mov eax,dword ptr [ebp-3Ch] 00000028 fld qword ptr [ebp+10h] 0000002b fadd qword ptr [eax] 0000002d fstp qword ptr [eax] this.Right += thatRight; 0000002f mov eax,dword ptr [ebp-3Ch] 00000032 fld qword ptr [ebp+8] 00000035 fadd qword ptr [eax+8] 00000038 fstp qword ptr [eax+8] } 0000003b nop 0000003c lea esp,[ebp-0Ch] 0000003f pop ebx 00000040 pop esi 00000041 pop edi 00000042 pop ebp 00000043 ret 10h 

我得到了非常不同的结果,更不是戏剧性的。 但没有使用testing跑步者,我把代码粘贴到控制台模式的应用程序。 5%的结果在32位模式下为〜87%,在我尝试时为64位模式下〜100%。

alignment对于双打是非常重要的,.NET运行时只能保证在32位机器上alignment4。 在我看来,testing运行人员正在开始testing方法与堆栈地址alignment到4而不是8.错误惩罚得到非常大,当双跨caching线的边界。

我在复制结果时遇到一些困难。

我把你的代码:

  • 使其成为一个独立的控制台应用程序
  • 构build了一个优化(发布)构build
  • 将“规模”因素从2.5M提高到了10M
  • 从命令行(在IDE之外)运行它,

当我这样做的时候,我得到了以下与你们截然不同的时机。 为了避免疑惑,我将发布我使用的代码。

这是我的时间

 Populating List<Element> took 527ms. The PlusEqual() method took 450ms. The 'same' += operator took 386ms. The 'same' -= operator took 446ms. The PlusEqual(double, double) method took 413ms. The do nothing loop took 229ms. The ratio of operator with constructor to method is 85%. The ratio of operator without constructor to method is 99%. The ratio of PlusEqual(double,double) to PlusEqual(Element) is 91%. If we remove the overhead time for the loop accessing the elements from the List... The ratio of operator with constructor to method is 71%. The ratio of operator without constructor to method is 98%. The ratio of PlusEqual(double,double) to PlusEqual(Element) is 83%. 

这些是我对你的代码的编辑:

 namespace OperatorVsMethod { public struct Element { public double Left; public double Right; public Element(double left, double right) { this.Left = left; this.Right = right; } public static Element operator +(Element x, Element y) { return new Element(x.Left + y.Left, x.Right + y.Right); } public static Element operator -(Element x, Element y) { x.Left += y.Left; x.Right += y.Right; return x; } /// <summary> /// Like the += operator; but faster. /// </summary> public void PlusEqual(Element that) { this.Left += that.Left; this.Right += that.Right; } /// <summary> /// Like the += operator; but faster. /// </summary> public void PlusEqual(double thatLeft, double thatRight) { this.Left += thatLeft; this.Right += thatRight; } } public class UnitTest1 { public static void Main() { Stopwatch stopwatch = new Stopwatch(); // Populate a List of Elements to multiply together int seedSize = 4; List<double> doubles = new List<double>(seedSize); doubles.Add(2.5d); doubles.Add(100000d); doubles.Add(-0.5d); doubles.Add(-100002d); int size = 10000000 * seedSize; List<Element> elts = new List<Element>(size); stopwatch.Reset(); stopwatch.Start(); for (int ii = 0; ii < size; ++ii) { int di = ii % seedSize; double d = doubles[di]; elts.Add(new Element(d, d)); } stopwatch.Stop(); long populateMS = stopwatch.ElapsedMilliseconds; // Measure speed of += operator (calls ctor) Element operatorCtorResult = new Element(1d, 1d); stopwatch.Reset(); stopwatch.Start(); for (int ii = 0; ii < size; ++ii) { operatorCtorResult += elts[ii]; } stopwatch.Stop(); long operatorCtorMS = stopwatch.ElapsedMilliseconds; // Measure speed of -= operator (+= without ctor) Element operatorNoCtorResult = new Element(1d, 1d); stopwatch.Reset(); stopwatch.Start(); for (int ii = 0; ii < size; ++ii) { operatorNoCtorResult -= elts[ii]; } stopwatch.Stop(); long operatorNoCtorMS = stopwatch.ElapsedMilliseconds; // Measure speed of PlusEqual(Element) method Element plusEqualResult = new Element(1d, 1d); stopwatch.Reset(); stopwatch.Start(); for (int ii = 0; ii < size; ++ii) { plusEqualResult.PlusEqual(elts[ii]); } stopwatch.Stop(); long plusEqualMS = stopwatch.ElapsedMilliseconds; // Measure speed of PlusEqual(double, double) method Element plusEqualDDResult = new Element(1d, 1d); stopwatch.Reset(); stopwatch.Start(); for (int ii = 0; ii < size; ++ii) { Element elt = elts[ii]; plusEqualDDResult.PlusEqual(elt.Left, elt.Right); } stopwatch.Stop(); long plusEqualDDMS = stopwatch.ElapsedMilliseconds; // Measure speed of doing nothing but accessing the Element Element doNothingResult = new Element(1d, 1d); stopwatch.Reset(); stopwatch.Start(); for (int ii = 0; ii < size; ++ii) { Element elt = elts[ii]; double left = elt.Left; double right = elt.Right; } stopwatch.Stop(); long doNothingMS = stopwatch.ElapsedMilliseconds; // Report speeds Console.WriteLine("Populating List<Element> took {0}ms.", populateMS); Console.WriteLine("The PlusEqual() method took {0}ms.", plusEqualMS); Console.WriteLine("The 'same' += operator took {0}ms.", operatorCtorMS); Console.WriteLine("The 'same' -= operator took {0}ms.", operatorNoCtorMS); Console.WriteLine("The PlusEqual(double, double) method took {0}ms.", plusEqualDDMS); Console.WriteLine("The do nothing loop took {0}ms.", doNothingMS); // Compare speeds long percentageRatio = 100L * operatorCtorMS / plusEqualMS; Console.WriteLine("The ratio of operator with constructor to method is {0}%.", percentageRatio); percentageRatio = 100L * operatorNoCtorMS / plusEqualMS; Console.WriteLine("The ratio of operator without constructor to method is {0}%.", percentageRatio); percentageRatio = 100L * plusEqualDDMS / plusEqualMS; Console.WriteLine("The ratio of PlusEqual(double,double) to PlusEqual(Element) is {0}%.", percentageRatio); operatorCtorMS -= doNothingMS; operatorNoCtorMS -= doNothingMS; plusEqualMS -= doNothingMS; plusEqualDDMS -= doNothingMS; Console.WriteLine("If we remove the overhead time for the loop accessing the elements from the List..."); percentageRatio = 100L * operatorCtorMS / plusEqualMS; Console.WriteLine("The ratio of operator with constructor to method is {0}%.", percentageRatio); percentageRatio = 100L * operatorNoCtorMS / plusEqualMS; Console.WriteLine("The ratio of operator without constructor to method is {0}%.", percentageRatio); percentageRatio = 100L * plusEqualDDMS / plusEqualMS; Console.WriteLine("The ratio of PlusEqual(double,double) to PlusEqual(Element) is {0}%.", percentageRatio); } } } 

在这里运行.NET 4.0。 我使用“任何CPU”进行编译,将.NET 4.0作为发布模式。 执行是从命令行。 它以64位模式运行。 我的时间有点不同

 Populating List<Element> took 442ms. The PlusEqual() method took 115ms. The 'same' += operator took 201ms. The 'same' -= operator took 200ms. The PlusEqual(double, double) method took 129ms. The do nothing loop took 93ms. The ratio of operator with constructor to method is 174%. The ratio of operator without constructor to method is 173%. The ratio of PlusEqual(double,double) to PlusEqual(Element) is 112%. If we remove the overhead time for the loop accessing the elements from the List ... The ratio of operator with constructor to method is 490%. The ratio of operator without constructor to method is 486%. The ratio of PlusEqual(double,double) to PlusEqual(Element) is 163%. 

特别是, PlusEqual(Element)PlusEqual(double, double)稍快。

无论在.NET 3.5中出现什么问题,它都不会出现在.NET 4.0中。

像@Corey Kosak一样,我只是在发布模式下将VS2010 Express中的这个代码作为一个简单的控制台应用程序运行。 我得到非常不同的数字。 但我也有Fx4.5所以这些可能不是一个干净的Fx4.0的结果。

 Populating List<Element> took 435ms. The PlusEqual() method took 109ms. The 'same' += operator took 217ms. The 'same' -= operator took 157ms. The PlusEqual(double, double) method took 118ms. The do nothing loop took 79ms. The ratio of operator with constructor to method is 199%. The ratio of operator without constructor to method is 144%. The ratio of PlusEqual(double,double) to PlusEqual(Element) is 108%. If we remove the overhead time for the loop accessing the elements from the List ... The ratio of operator with constructor to method is 460%. The ratio of operator without constructor to method is 260%. The ratio of PlusEqual(double,double) to PlusEqual(Element) is 130%. 

编辑:现在从cmd行运行。 这确实有所作为,数字变化较小。

不知道这是否是相关的,但这是Windows 7 64位上的.NET 4.0 64位的数字。 我的mscorwks.dll版本是2.0.50727.5446。 我只是将代码粘贴到LINQPad并从那里运行。 结果如下:

 Populating List<Element> took 496ms. The PlusEqual() method took 189ms. The 'same' += operator took 295ms. The 'same' -= operator took 358ms. The PlusEqual(double, double) method took 148ms. The do nothing loop took 103ms. The ratio of operator with constructor to method is 156%. The ratio of operator without constructor to method is 189%. The ratio of PlusEqual(double,double) to PlusEqual(Element) is 78%. If we remove the overhead time for the loop accessing the elements from the List ... The ratio of operator with constructor to method is 223%. The ratio of operator without constructor to method is 296%. The ratio of PlusEqual(double,double) to PlusEqual(Element) is 52%. 

除了在其他答案中提到的JIT编译器差异之外,结构体方法调用和结构体运算符之间的另一个区别是,结构体方法调用将把this作为parameter passing(也可以被写为接受其他参数作为参数) ,而结构运算符将按值传递所有操作数。 无论结构有多大,传递任意大小的结构作为参数参数的成本都是固定的,而通过较大结构的成本与结构大小成正比。 如果可以避免不必要的复制 ,使用大型结构(甚至数百字节)没有任何问题; 而使用方法时往往可以防止不必要的副本,使用操作员时不能防止副本。

我会想象当你访问结构的成员,它实际上是做一个额外的操作来访问成员,THIS指针+偏移量。

可能而不是列表你应该使用double []与“众所周知的”偏移量和索引增量?