为什么numpy.array这么慢?

我对此感到困惑

def main(): for i in xrange(2560000): a = [0.0, 0.0, 0.0] main() $ time python test.py real 0m0.793s 

现在让我们看看numpy:

 import numpy def main(): for i in xrange(2560000): a = numpy.array([0.0, 0.0, 0.0]) main() $ time python test.py real 0m39.338s 

神圣的CPU周期蝙蝠侠!

使用numpy.zeros(3)改进,但仍然不够IMHO

 $ time python test.py real 0m5.610s user 0m5.449s sys 0m0.070s 

numpy.version.version ='1.5.1'

如果您想知道在第一个示例中是否跳过了列表创build优化,那么不是:

  5 19 LOAD_CONST 2 (0.0) 22 LOAD_CONST 2 (0.0) 25 LOAD_CONST 2 (0.0) 28 BUILD_LIST 3 31 STORE_FAST 1 (a) 

Numpy针对大量数据进行了优化。 给它一个微小的3长arrays,毫不奇怪,它performance不佳。

考虑一个单独的testing

 import timeit reps = 100 pythonTest = timeit.Timer('a = [0.] * 1000000') numpyTest = timeit.Timer('a = numpy.zeros(1000000)', setup='import numpy') uninitialised = timeit.Timer('a = numpy.empty(1000000)', setup='import numpy') # empty simply allocates the memory. Thus the initial contents of the array # is random noise print 'python list:', pythonTest.timeit(reps), 'seconds' print 'numpy array:', numpyTest.timeit(reps), 'seconds' print 'uninitialised array:', uninitialised.timeit(reps), 'seconds' 

而输出是

 python list: 1.22042918205 seconds numpy array: 1.05412316322 seconds uninitialised array: 0.0016028881073 seconds 

这似乎是arrays的调整,正在采取所有的时间numpy。 所以,除非你需要初始化数组,然后尝试使用空。

Holy CPU cycles batman! , 确实。

但是请宁可考虑一些与numpy非常基本相关的东西。 复杂的基于线性代数的function(如random numberssingular value decomposition )。 现在,请考虑这些接近简单的计算:

 In []: A= rand(2560000, 3) In []: %timeit rand(2560000, 3) 1 loops, best of 3: 296 ms per loop In []: %timeit u, s, v= svd(A, full_matrices= False) 1 loops, best of 3: 571 ms per loop 

请相信我这样的performance不会被现有的任何套餐大打折扣。

所以,请描述你真正的问题,我会尝试找出体面numpy基础的解决scheme。

更新:
下面是一些射线球体相交的简单代码:

 import numpy as np def mag(X): # magnitude return (X** 2).sum(0)** .5 def closest(R, c): # closest point on ray to center and its distance P= np.dot(cT, R)* R return P, mag(P- c) def intersect(R, P, h, r): # intersection of rays and sphere return P- (h* (2* r- h))** .5* R # set up c, r= np.array([10, 10, 10])[:, None], 2. # center, radius n= 5e5 R= np.random.rand(3, n) # some random rays in first octant R= R/ mag(R) # normalized to unit length # find rays which will intersect sphere P, b= closest(R, c) wi= b<= r # and for those which will, find the intersection X= intersect(R[:, wi], P[:, wi], r- b[wi], r) 

显然我们计算正确:

 In []: allclose(mag(X- c), r) Out[]: True 

还有一些时机:

 In []: % timeit P, b= closest(R, c) 10 loops, best of 3: 93.4 ms per loop In []: n/ 0.0934 Out[]: 5353319 #=> more than 5 million detection's of possible intersections/ s In []: %timeit X= intersect(R[:, wi], P[:, wi], r- b[wi]) 10 loops, best of 3: 32.7 ms per loop In []: X.shape[1]/ 0.0327 Out[]: 874037 #=> almost 1 million actual intersections/ s 

这些时间是用非常适中的机器完成的。 用现代化的机器,仍然可以期待显着的加速。

无论如何,这只是一个简短的演示如何编码与numpy

迟到的答案,但对其他观众可能很重要。

在kwant项目中也考虑到了这个问题。 事实上,小数组并不是在numpy中进行优化的,相当经常的小数组正是您所需要的。

在这方面,他们创build了一个替代小数组的行为,并与numpy数组共同存在(新数据types中的任何未实现的操作都由numpy处理)。

你应该看看这个项目:
https://pypi.python.org/pypi/tinyarray/1.0.5
主要目的是为小arraysperformance出色。 当然,你可以用numpy做的一些更奇特的事情是不支持的。 但数字似乎是你的要求。

我做了一些小testing:

python

我已经添加numpy导入来获得正确的加载时间

 import numpy def main(): for i in xrange(2560000): a = [0.0, 0.0, 0.0] main() 

numpy的

 import numpy def main(): for i in xrange(2560000): a = numpy.array([0.0, 0.0, 0.0]) main() 

numpy的零

 import numpy def main(): for i in xrange(2560000): a = numpy.zeros((3,1)) main() 

tinyarray

 import numpy,tinyarray def main(): for i in xrange(2560000): a = tinyarray.array([0.0, 0.0, 0.0]) main() 

tinyarray零

 import numpy,tinyarray def main(): for i in xrange(2560000): a = tinyarray.zeros((3,1)) main() 

我跑这个:

 for f in python numpy numpy_zero tiny tiny_zero ; do echo $f for i in `seq 5` ; do time python ${f}_test.py done done 

得到:

 python python ${f}_test.py 0.31s user 0.02s system 99% cpu 0.339 total python ${f}_test.py 0.29s user 0.03s system 98% cpu 0.328 total python ${f}_test.py 0.33s user 0.01s system 98% cpu 0.345 total python ${f}_test.py 0.31s user 0.01s system 98% cpu 0.325 total python ${f}_test.py 0.32s user 0.00s system 98% cpu 0.326 total numpy python ${f}_test.py 2.79s user 0.01s system 99% cpu 2.812 total python ${f}_test.py 2.80s user 0.02s system 99% cpu 2.832 total python ${f}_test.py 3.01s user 0.02s system 99% cpu 3.033 total python ${f}_test.py 2.99s user 0.01s system 99% cpu 3.012 total python ${f}_test.py 3.20s user 0.01s system 99% cpu 3.221 total numpy_zero python ${f}_test.py 1.04s user 0.02s system 99% cpu 1.075 total python ${f}_test.py 1.08s user 0.02s system 99% cpu 1.106 total python ${f}_test.py 1.04s user 0.02s system 99% cpu 1.065 total python ${f}_test.py 1.03s user 0.02s system 99% cpu 1.059 total python ${f}_test.py 1.05s user 0.01s system 99% cpu 1.064 total tiny python ${f}_test.py 0.93s user 0.02s system 99% cpu 0.955 total python ${f}_test.py 0.98s user 0.01s system 99% cpu 0.993 total python ${f}_test.py 0.93s user 0.02s system 99% cpu 0.953 total python ${f}_test.py 0.92s user 0.02s system 99% cpu 0.944 total python ${f}_test.py 0.96s user 0.01s system 99% cpu 0.978 total tiny_zero python ${f}_test.py 0.71s user 0.03s system 99% cpu 0.739 total python ${f}_test.py 0.68s user 0.02s system 99% cpu 0.711 total python ${f}_test.py 0.70s user 0.01s system 99% cpu 0.721 total python ${f}_test.py 0.70s user 0.02s system 99% cpu 0.721 total python ${f}_test.py 0.67s user 0.01s system 99% cpu 0.687 total 

现在这些testing(如已经指出的)不是最好的testing。 但是,他们仍然表明,微arrays更适合小arrays。
另一个事实是,在微arrays中最常见的操作应该更快。 所以它可能比使用数据创build有更好的使用效果。

我从来没有尝试过一个完全成熟的项目,但是kwant项目正在使用它