用OpenBLAS集成编译numpy

我正在尝试安装OpenBLAS numpy ,但是我不知道如何编写site.cfg文件。

遵循安装过程时,安装完成时没有错误,但是从1(由环境variablesOMP_NUM_THREADS控制)增加OpenBLAS使用的线程数会降低性能。

我不确定OpenBLAS集成是否完美。 任何一个可以提供一个site.cfg文件来实现相同的。

PS:在其他工具包(如基于Python的Theano)中集成OpenBLAS,可以在同一台机器上增加线程数量,大大提高性能。

我刚刚在一个OpenBLAS集成的virtualenv内部编译了numpy ,看起来工作正常。

这是我的过程:

  1. 编译OpenBLAS

     $ git clone https://github.com/xianyi/OpenBLAS $ cd OpenBLAS && make FC=gfortran $ sudo make PREFIX=/opt/OpenBLAS install 

    如果您没有pipe理员权限,则可以将PREFIX=设置为具有写权限的目录(只需修改相应的步骤即可)。

  2. 确保包含libopenblas.so的目录位于共享库searchpath中。

    • 要在本地执行此操作,可以编辑您的~/.bashrc文件以包含该行

       export LD_LIBRARY_PATH=/opt/OpenBLAS/lib:$LD_LIBRARY_PATH 

      当您启动一个新的terminal会话时, LD_LIBRARY_PATH环境variables将被更新(使用$ source ~/.bashrc在同一个会话中强制更新)。

    • 另一个适用于多用户的选项是在/etc/ld.so.conf.d/中创build一个包含/opt/OpenBLAS/lib行的.conf文件,例如:

       $ sudo sh -c "echo '/opt/OpenBLAS/lib' > /etc/ld.so.conf.d/openblas.conf" 

    一旦你完成任何一个选项,运行

     $ sudo ldconfig 
  3. 抓取numpy源代码:

     $ git clone https://github.com/numpy/numpy $ cd numpy 
  4. site.cfg.example复制到site.cfg并编辑副本:

     $ cp site.cfg.example site.cfg $ nano site.cfg 

    取消这些行的注释:

     .... [openblas] libraries = openblas library_dirs = /opt/OpenBLAS/lib include_dirs = /opt/OpenBLAS/include .... 
  5. 检查configuration, virtualenv ,安装(可选的在virtualenv

     $ python setup.py config 

    输出应该是这样的:

     ... openblas_info: FOUND: libraries = ['openblas', 'openblas'] library_dirs = ['/opt/OpenBLAS/lib'] language = c define_macros = [('HAVE_CBLAS', None)] FOUND: libraries = ['openblas', 'openblas'] library_dirs = ['/opt/OpenBLAS/lib'] language = c define_macros = [('HAVE_CBLAS', None)] ... 

    使用pip安装比使用python setup.py install更可取 ,因为pip将跟踪包的元数据,并允许您将来轻松地卸载或升级numpy。

     $ pip install . 
  6. 可选:您可以使用此脚本来testing不同线程数的性能。

     $ OMP_NUM_THREADS=1 python build/test_numpy.py version: 1.10.0.dev0+8e026a2 maxint: 9223372036854775807 BLAS info: * libraries ['openblas', 'openblas'] * library_dirs ['/opt/OpenBLAS/lib'] * define_macros [('HAVE_CBLAS', None)] * language c dot: 0.099796795845 sec $ OMP_NUM_THREADS=8 python build/test_numpy.py version: 1.10.0.dev0+8e026a2 maxint: 9223372036854775807 BLAS info: * libraries ['openblas', 'openblas'] * library_dirs ['/opt/OpenBLAS/lib'] * define_macros [('HAVE_CBLAS', None)] * language c dot: 0.0439578056335 sec 

高线程数似乎有明显的改善。 不过,我还没有非常系统地testing过,而且对于更小的matrix来说,额外的开销可能会超过线程数更高的性能优势。

以防万一你使用的Ubuntu或薄荷,你可以很容易地有openblas链接numpy通过安装numpy和openblas通过apt-get

 sudo apt-get install numpy libopenblas-dev 

在一个新的Docker Ubuntu上,我testing了从博客文章“安装Numpy和OpenBLAS”复制的以下脚本

 import numpy as np import numpy.random as npr import time # --- Test 1 N = 1 n = 1000 A = npr.randn(n,n) B = npr.randn(n,n) t = time.time() for i in range(N): C = np.dot(A, B) td = time.time() - t print("dotted two (%d,%d) matrices in %0.1f ms" % (n, n, 1e3*td/N)) # --- Test 2 N = 100 n = 4000 A = npr.randn(n) B = npr.randn(n) t = time.time() for i in range(N): C = np.dot(A, B) td = time.time() - t print("dotted two (%d) vectors in %0.2f us" % (n, 1e6*td/N)) # --- Test 3 m,n = (2000,1000) A = npr.randn(m,n) t = time.time() [U,s,V] = np.linalg.svd(A, full_matrices=False) td = time.time() - t print("SVD of (%d,%d) matrix in %0.3fs" % (m, n, td)) # --- Test 4 n = 1500 A = npr.randn(n,n) t = time.time() w, v = np.linalg.eig(A) td = time.time() - t print("Eigendecomp of (%d,%d) matrix in %0.3fs" % (n, n, td)) 

没有openblas的结果是:

 dotted two (1000,1000) matrices in 563.8 ms dotted two (4000) vectors in 5.16 us SVD of (2000,1000) matrix in 6.084 s Eigendecomp of (1500,1500) matrix in 14.605 s 

我安装openblas与apt install openblas-dev ,我检查了与numpy的联系

 import numpy as np np.__config__.show() 

和信息是

 atlas_threads_info: NOT AVAILABLE openblas_info: NOT AVAILABLE atlas_blas_info: NOT AVAILABLE atlas_3_10_threads_info: NOT AVAILABLE blas_info: library_dirs = ['/usr/lib'] libraries = ['blas', 'blas'] language = c define_macros = [('HAVE_CBLAS', None)] mkl_info: NOT AVAILABLE atlas_3_10_blas_threads_info: NOT AVAILABLE atlas_3_10_blas_info: NOT AVAILABLE openblas_lapack_info: NOT AVAILABLE lapack_opt_info: library_dirs = ['/usr/lib'] libraries = ['lapack', 'lapack', 'blas', 'blas'] language = c define_macros = [('NO_ATLAS_INFO', 1), ('HAVE_CBLAS', None)] blas_opt_info: library_dirs = ['/usr/lib'] libraries = ['blas', 'blas'] language = c define_macros = [('NO_ATLAS_INFO', 1), ('HAVE_CBLAS', None)] atlas_info: NOT AVAILABLE blas_mkl_info: NOT AVAILABLE lapack_mkl_info: NOT AVAILABLE atlas_3_10_info: NOT AVAILABLE lapack_info: library_dirs = ['/usr/lib'] libraries = ['lapack', 'lapack'] language = f77 atlas_blas_threads_info: NOT AVAILABLE 

它不显示与openblas的链接。 但是,脚本的新结果显示numpy必须使用openblas:

 dotted two (1000,1000) matrices in 15.2 ms dotted two (4000) vectors in 2.64 us SVD of (2000,1000) matrix in 0.469 s Eigendecomp of (1500,1500) matrix in 2.794 s