为什么numpy.array这么慢？

我对此感到困惑

def main(): for i in xrange(2560000): a = [0.0, 0.0, 0.0] main() $ time python test.py real 0m0.793s

现在让我们看看numpy：

 import numpy def main(): for i in xrange(2560000): a = numpy.array([0.0, 0.0, 0.0]) main() $ time python test.py real 0m39.338s

神圣的CPU周期蝙蝠侠！

使用numpy.zeros(3)改进，但仍然不够IMHO

 $ time python test.py real 0m5.610s user 0m5.449s sys 0m0.070s

numpy.version.version ='1.5.1'

如果您想知道在第一个示例中是否跳过了列表创build优化，那么不是：

  5 19 LOAD_CONST 2 (0.0) 22 LOAD_CONST 2 (0.0) 25 LOAD_CONST 2 (0.0) 28 BUILD_LIST 3 31 STORE_FAST 1 (a)

Numpy针对大量数据进行了优化。给它一个微小的3长arrays，毫不奇怪，它performance不佳。

考虑一个单独的testing

 import timeit reps = 100 pythonTest = timeit.Timer('a = [0.] * 1000000') numpyTest = timeit.Timer('a = numpy.zeros(1000000)', setup='import numpy') uninitialised = timeit.Timer('a = numpy.empty(1000000)', setup='import numpy') # empty simply allocates the memory. Thus the initial contents of the array # is random noise print 'python list:', pythonTest.timeit(reps), 'seconds' print 'numpy array:', numpyTest.timeit(reps), 'seconds' print 'uninitialised array:', uninitialised.timeit(reps), 'seconds'

而输出是

 python list: 1.22042918205 seconds numpy array: 1.05412316322 seconds uninitialised array: 0.0016028881073 seconds

这似乎是arrays的调整，正在采取所有的时间numpy。所以，除非你需要初始化数组，然后尝试使用空。

Holy CPU cycles batman! ，确实。

但是请宁可考虑一些与numpy非常基本相关的东西。复杂的基于线性代数的function（如random numbers或singular value decomposition ）。现在，请考虑这些接近简单的计算：

 In []: A= rand(2560000, 3) In []: %timeit rand(2560000, 3) 1 loops, best of 3: 296 ms per loop In []: %timeit u, s, v= svd(A, full_matrices= False) 1 loops, best of 3: 571 ms per loop

请相信我这样的performance不会被现有的任何套餐大打折扣。

所以，请描述你真正的问题，我会尝试找出体面numpy基础的解决scheme。

更新：
下面是一些射线球体相交的简单代码：

 import numpy as np def mag(X): # magnitude return (X** 2).sum(0)** .5 def closest(R, c): # closest point on ray to center and its distance P= np.dot(cT, R)* R return P, mag(P- c) def intersect(R, P, h, r): # intersection of rays and sphere return P- (h* (2* r- h))** .5* R # set up c, r= np.array([10, 10, 10])[:, None], 2. # center, radius n= 5e5 R= np.random.rand(3, n) # some random rays in first octant R= R/ mag(R) # normalized to unit length # find rays which will intersect sphere P, b= closest(R, c) wi= b<= r # and for those which will, find the intersection X= intersect(R[:, wi], P[:, wi], r- b[wi], r)

显然我们计算正确：

 In []: allclose(mag(X- c), r) Out[]: True

还有一些时机：

 In []: % timeit P, b= closest(R, c) 10 loops, best of 3: 93.4 ms per loop In []: n/ 0.0934 Out[]: 5353319 #=> more than 5 million detection's of possible intersections/ s In []: %timeit X= intersect(R[:, wi], P[:, wi], r- b[wi]) 10 loops, best of 3: 32.7 ms per loop In []: X.shape[1]/ 0.0327 Out[]: 874037 #=> almost 1 million actual intersections/ s

这些时间是用非常适中的机器完成的。用现代化的机器，仍然可以期待显着的加速。

无论如何，这只是一个简短的演示如何编码与numpy 。

迟到的答案，但对其他观众可能很重要。

在kwant项目中也考虑到了这个问题。事实上，小数组并不是在numpy中进行优化的，相当经常的小数组正是您所需要的。

在这方面，他们创build了一个替代小数组的行为，并与numpy数组共同存在（新数据types中的任何未实现的操作都由numpy处理）。

你应该看看这个项目：
https://pypi.python.org/pypi/tinyarray/1.0.5
主要目的是为小arraysperformance出色。当然，你可以用numpy做的一些更奇特的事情是不支持的。但数字似乎是你的要求。

我做了一些小testing：

python

我已经添加numpy导入来获得正确的加载时间

 import numpy def main(): for i in xrange(2560000): a = [0.0, 0.0, 0.0] main()

numpy的

 import numpy def main(): for i in xrange(2560000): a = numpy.array([0.0, 0.0, 0.0]) main()

numpy的零

 import numpy def main(): for i in xrange(2560000): a = numpy.zeros((3,1)) main()

tinyarray

 import numpy,tinyarray def main(): for i in xrange(2560000): a = tinyarray.array([0.0, 0.0, 0.0]) main()

tinyarray零

 import numpy,tinyarray def main(): for i in xrange(2560000): a = tinyarray.zeros((3,1)) main()

我跑这个：

 for f in python numpy numpy_zero tiny tiny_zero ; do echo $f for i in `seq 5` ; do time python ${f}_test.py done done

得到：

 python python ${f}_test.py 0.31s user 0.02s system 99% cpu 0.339 total python ${f}_test.py 0.29s user 0.03s system 98% cpu 0.328 total python ${f}_test.py 0.33s user 0.01s system 98% cpu 0.345 total python ${f}_test.py 0.31s user 0.01s system 98% cpu 0.325 total python ${f}_test.py 0.32s user 0.00s system 98% cpu 0.326 total numpy python ${f}_test.py 2.79s user 0.01s system 99% cpu 2.812 total python ${f}_test.py 2.80s user 0.02s system 99% cpu 2.832 total python ${f}_test.py 3.01s user 0.02s system 99% cpu 3.033 total python ${f}_test.py 2.99s user 0.01s system 99% cpu 3.012 total python ${f}_test.py 3.20s user 0.01s system 99% cpu 3.221 total numpy_zero python ${f}_test.py 1.04s user 0.02s system 99% cpu 1.075 total python ${f}_test.py 1.08s user 0.02s system 99% cpu 1.106 total python ${f}_test.py 1.04s user 0.02s system 99% cpu 1.065 total python ${f}_test.py 1.03s user 0.02s system 99% cpu 1.059 total python ${f}_test.py 1.05s user 0.01s system 99% cpu 1.064 total tiny python ${f}_test.py 0.93s user 0.02s system 99% cpu 0.955 total python ${f}_test.py 0.98s user 0.01s system 99% cpu 0.993 total python ${f}_test.py 0.93s user 0.02s system 99% cpu 0.953 total python ${f}_test.py 0.92s user 0.02s system 99% cpu 0.944 total python ${f}_test.py 0.96s user 0.01s system 99% cpu 0.978 total tiny_zero python ${f}_test.py 0.71s user 0.03s system 99% cpu 0.739 total python ${f}_test.py 0.68s user 0.02s system 99% cpu 0.711 total python ${f}_test.py 0.70s user 0.01s system 99% cpu 0.721 total python ${f}_test.py 0.70s user 0.02s system 99% cpu 0.721 total python ${f}_test.py 0.67s user 0.01s system 99% cpu 0.687 total

现在这些testing（如已经指出的）不是最好的testing。但是，他们仍然表明，微arrays更适合小arrays。
另一个事实是，在微arrays中最常见的操作应该更快。所以它可能比使用数据创build有更好的使用效果。

我从来没有尝试过一个完全成熟的项目，但是kwant项目正在使用它

为什么numpy.array这么慢？

python

numpy的

numpy的零

tinyarray

tinyarray零

Numpy：快速find第一个价值指数

在Python中实现MATLAB的im2col'sliding'

如何在Python中规范化二维numpy数组less？

Numpy：find范围内的元素

Fast Haversine近似（Python / Pandas）

如何检查我正在使用哪个版本的Numpy？

如何从生成器中构build一个numpy数组？

Numpy：从二维数组中获取随机的一组行

我怎样才能有效地处理类似于Matlab的blkproc（blockproc）函数块的numpy数组

pandas的大小和数量有什么区别？