Numpy：快速find第一个价值指数

我怎样才能find一个数字的第一次出现在一个Numpy数组的索引？速度对我很重要。我对以下答案不感兴趣，因为他们扫描整个数组，并且在发现第一个发生时不停止：

itemindex = numpy.where(array==item)[0][0] nonzero(array == item)[0][0]

注1：这个问题的答案没有看起来相关是否有一个Numpy函数返回数组中的第一个索引的东西？

注2：使用C编译的方法比Python循环更受欢迎。

Numpy 2.0.0有一个function请求： https ： //github.com/numpy/numpy/issues/2269

尽pipe对于你来说已经太迟了，但为了将来的参考：使用numba（ 1 ）是numpy实现它之前最简单的方法。如果你使用anaconda python发行，它应该已经安装。代码将被编译，所以它会很快。

 @jit(nopython=True) def find_first(item, vec): """return the index of the first occurence of item in vec""" for i in xrange(len(vec)): if item == vec[i]: return i return -1

接着：

 >>> a = array([1,7,8,32]) >>> find_first(8,a) 2

您可以使用array.tostring()将布尔数组转换为Pythonstring，然后使用find（）方法：

 (array==item).tostring().find('\x01')

这确实涉及复制数据，因为Pythonstring需要是不可变的。一个好处是你也可以通过find\x00\x01来寻找例如一个上升沿

我认为你遇到了一个问题，那就是一个不同的方法和一些先验知识会真正帮助你。在Y数据的第一个百分比中有X的概率find你的答案。分裂的问题，希望得到幸运，然后在嵌套的列表理解或Python的东西这样做。

编写一个C函数来做这个蛮力也不是太难用ctypes 。

我一起入侵的C代码（index.c）：

 long index(long val, long *data, long length){ long ans, i; for(i=0;i<length;i++){ if (data[i] == val) return(i); } return(-999); }

和python：

 # to compile (mac) # gcc -shared index.c -o index.dylib import ctypes lib = ctypes.CDLL('index.dylib') lib.index.restype = ctypes.c_long lib.index.argtypes = (ctypes.c_long, ctypes.POINTER(ctypes.c_long), ctypes.c_long) import numpy as np np.random.seed(8675309) a = np.random.random_integers(0, 100, 10000) print lib.index(57, a.ctypes.data_as(ctypes.POINTER(ctypes.c_long)), len(a))

我得到92。

把python打包成一个适当的函数，然后你就去。

对于这个种子来说，C版本的速度要快很多（〜20倍）（警告我对timeit不太好）

 import timeit t = timeit.Timer('np.where(a==57)[0][0]', 'import numpy as np; np.random.seed(1); a = np.random.random_integers(0, 1000000, 10000000)') t.timeit(100)/100 # 0.09761879920959472 t2 = timeit.Timer('lib.index(57, a.ctypes.data_as(ctypes.POINTER(ctypes.c_long)), len(a))', 'import numpy as np; np.random.seed(1); a = np.random.random_integers(0, 1000000, 10000000); import ctypes; lib = ctypes.CDLL("index.dylib"); lib.index.restype = ctypes.c_long; lib.index.argtypes = (ctypes.c_long, ctypes.POINTER(ctypes.c_long), ctypes.c_long) ') t2.timeit(100)/100 # 0.005288000106811523

如果sorting数组np.searchsorted工程。

我已经做了几个方法的基准：

argwhere
在问题中nonzero
.tostring()在@Rob Reilink的答案中
python循环
Fortran循环

Python和Fortran代码都可用。我跳过了没有想到的转换成列表。

对数刻度上的结果。 X轴是针的位置（需要更长的时间才能发现它是否在arrays的下方）。最后一个值是不在数组中的针。 Y轴是find它的时间。

基准结果

arrays有100万个元素，testing跑了100次。结果还是有些波动，但质量趋势是明显的：Python和f2py在第一个元素退出，所以他们的规模不同。如果针头不在第一个1％，Python变得太慢，而f2py很快（但你需要编译它）。

总而言之， f2py是最快的解决scheme ，特别是如果针头显得相当早。

这不是内置的烦人，但它只是2分钟的工作。将其添加到名为search.f90的文件中：

 subroutine find_first(needle, haystack, haystack_length, index) implicit none integer, intent(in) :: needle integer, intent(in) :: haystack_length integer, intent(in), dimension(haystack_length) :: haystack !f2py intent(inplace) haystack integer, intent(out) :: index integer :: k index = -1 do k = 1, haystack_length if (haystack(k)==needle) then index = k - 1 exit endif enddo end

如果你正在寻找除了integer以外的东西，只要改变types。然后编译使用：

 f2py -c -m search search.f90

之后你可以做（从Python）：

 import search print(search.find_first.__doc__) a = search.find_first(your_int_needle, your_int_array)

如果你的列表是sorting的 ，你可以通过'bisect'包快速search索引。它是O（log（n））而不是O（n）。

 bisect.bisect(a, x)

在数组a中findx，在sorting的情况下比在所有第一个元素（足够长的列表）中经过的任何C-routine要快得多。

有时候很高兴认识。

据我所知，布尔数组上的np.any和np.all都是短路的。

在你的情况下，numpy必须遍历整个数组两次，一次创build布尔条件，第二次find索引。

在这种情况下，我的build议是使用cython。我认为应该很容易调整这个例子，特别是如果你不需要对不同的dtypes和形状有很大的灵活性。

我需要这个工作，所以我教自己的Python和Numpy的C接口，写我自己的。 http://pastebin.com/GtcXuLyd它只适用于一维数组，但是适用于大多数数据types（int，float或strings），而且testing表明它比纯Python中预期的方法快了20倍; – numpy的。

@tal已经提供了一个numba函数来查找第一个索引，但只适用于一维数组。有了np.ndenumerate你也可以在维数组中find第一个索引：

 from numba import njit import numpy as np @njit def index(array, item): for idx, val in np.ndenumerate(array): if val == item: return idx return None

示例：

 >>> arr = np.arange(9).reshape(3,3) >>> index(arr, 3) (1, 0)

时间表明它在性能上与Tals解决scheme类似：

 arr = np.arange(100000) %timeit index(arr, 5) # 1000000 loops, best of 3: 1.88 µs per loop %timeit find_first(5, arr) # 1000000 loops, best of 3: 1.7 µs per loop %timeit index(arr, 99999) # 10000 loops, best of 3: 118 µs per loop %timeit find_first(99999, arr) # 10000 loops, best of 3: 96 µs per loop

您可以使用.data属性在numpy数组上获得读写缓冲区。迭代，但是你将需要知道你的数据是行还是列主要的（使用ndarray.shape和numpy.unravel_index将平面索引转换回索引元组）。

只需要注意，如果您正在执行一系列search，如果search维度不够大，从外部循环中执行诸如转换为string这样的巧妙操作所带来的性能提升可能会丢失。看看使用上面提出的string转换技巧的迭代find1的性能，以及沿着内轴使用argmax的find2（加上确保不匹配的调整返回为-1）

 import numpy,time def find1(arr,value): return (arr==value).tostring().find('\x01') def find2(arr,value): #find value over inner most axis, and return array of indices to the match b = arr==value return b.argmax(axis=-1) - ~(b.any()) for size in [(1,100000000),(10000,10000),(1000000,100),(10000000,10)]: print(size) values = numpy.random.choice([0,0,0,0,0,0,0,1],size=size) v = values>0 t=time.time() numpy.apply_along_axis(find1,-1,v,1) print('find1',time.time()-t) t=time.time() find2(v,1) print('find2',time.time()-t)

输出

 (1, 100000000) ('find1', 0.25300002098083496) ('find2', 0.2780001163482666) (10000, 10000) ('find1', 0.46200013160705566) ('find2', 0.27300000190734863) (1000000, 100) ('find1', 20.98099994659424) ('find2', 0.3040001392364502) (10000000, 10) ('find1', 206.7590000629425) ('find2', 0.4830000400543213)

这就是说，用C写的一个发现至less比这两种方法都快一点

你可以将你的数组转换成一个list并使用它的index()方法：

 i = list(array).index(item)

据我所知，这是一个C编译的方法。

这个怎么样

 import numpy as np np.amin(np.where(array==item))

Numpy：快速find第一个价值指数

jQuery通过使用AND和OR运算符来select属性

我怎样才能让xargs处理包含空格的文件名？

查找和复制文件

grep没有显示path/文件：行

使用`find -perm`来查找何时未设置权限

在Eclipse中是否有“在文件中查找”快捷方式？

没有Git元数据的Tarballing

在Eclipse项目中查找方法调用

gitfind胖提交

使用Linq获取列表中对象的索引