在numpy数组中find最接近的值

是否有一个numpy-thonic的方式，例如函数，在数组中find最接近的值？

例：

np.find_nearest( array, value )

 import numpy as np def find_nearest(array,value): idx = (np.abs(array-value)).argmin() return array[idx] array = np.random.random(10) print(array) # [ 0.21069679 0.61290182 0.63425412 0.84635244 0.91599191 0.00213826 # 0.17104965 0.56874386 0.57319379 0.28719469] value = 0.5 print(find_nearest(array, value)) # 0.568743859261

如果你的数组被sorting并且非常大，这是一个更快的解决scheme：

 def find_nearest(array,value): idx = np.searchsorted(array, value, side="left") if idx > 0 and (idx == len(array) or math.fabs(value - array[idx-1]) < math.fabs(value - array[idx])): return array[idx-1] else: return array[idx]

这扩大到非常大的数组。如果您不能假定数组已经sorting，则可以轻松修改上面的方法以在方法中进行sorting。这对于小型arrays来说太过分了，但是一旦变大了，速度就会快得多。

上面的答案稍作修改，可用于任意维数组（1d，2d，3d，…）：

 def find_nearest(a, a0): "Element in nd array `a` closest to the scalar value `a0`" idx = np.abs(a - a0).argmin() return a.flat[idx]

或者，写成一行：

 a.flat[np.abs(a - a0).argmin()]

这是一个扩展，用于在向量数组中find最接近的向量。

 import numpy as np def find_nearest_vector(array, value): idx = np.array([np.linalg.norm(x+y) for (x,y) in array-value]).argmin() return array[idx] A = np.random.random((10,2))*100 """ A = array([[ 34.19762933, 43.14534123], [ 48.79558706, 47.79243283], [ 38.42774411, 84.87155478], [ 63.64371943, 50.7722317 ], [ 73.56362857, 27.87895698], [ 96.67790593, 77.76150486], [ 68.86202147, 21.38735169], [ 5.21796467, 59.17051276], [ 82.92389467, 99.90387851], [ 6.76626539, 30.50661753]])""" pt = [6, 30] print find_nearest_vector(A,pt) # array([ 6.76626539, 30.50661753])

这里是@Ari Onasafari的scipy版本，回答“ 在向量数组中寻找最近的向量 ”

 In [1]: from scipy import spatial In [2]: import numpy as np In [3]: A = np.random.random((10,2))*100 In [4]: A Out[4]: array([[ 68.83402637, 38.07632221], [ 76.84704074, 24.9395109 ], [ 16.26715795, 98.52763827], [ 70.99411985, 67.31740151], [ 71.72452181, 24.13516764], [ 17.22707611, 20.65425362], [ 43.85122458, 21.50624882], [ 76.71987125, 44.95031274], [ 63.77341073, 78.87417774], [ 8.45828909, 30.18426696]]) In [5]: pt = [6, 30] # <-- the point to find In [6]: A[spatial.KDTree(A).query(pt)[1]] # <-- the nearest point Out[6]: array([ 8.45828909, 30.18426696]) #how it works! In [7]: distance,index = spatial.KDTree(A).query(pt) In [8]: distance # <-- The distances to the nearest neighbors Out[8]: 2.4651855048258393 In [9]: index # <-- The locations of the neighbors Out[9]: 9 #then In [10]: A[index] Out[10]: array([ 8.45828909, 30.18426696])

这是一个处理非标量“值”数组的版本：

 import numpy as np def find_nearest(array, values): indices = np.abs(np.subtract.outer(array, values)).argmin(0) return array[indices]

或者，如果input是标量，则返回数字types（例如int，float）的版本：

 def find_nearest(array, values): values = np.atleast_1d(values) indices = np.abs(np.subtract.outer(array, values)).argmin(0) out = array[indices] return out if len(out) > 1 else out[0]

如果你不想使用numpy这将做到这一点：

 def find_nearest(array, value): n = [abs(i-value) for i in array] idx = n.index(min(n)) return array[idx]

回答总结 ：如果有一个sorting的array那么平分代码（下面给出）执行速度最快。对于大型arrays，速度要快100-1000倍，对于小型arrays要快200-100倍。它也不需要numpy。如果你有一个未sorting的array那么如果array很大，那么应该首先考虑使用O（n logn）sorting然后二分，如果array很小，那么方法2似乎是最快的。

首先，你应该澄清你的意思是最接近的价值 。通常人们想要横坐标的间隔，例如array = [0,0.7,2.1]，value = 1.95，答案将是idx = 1。这是我怀疑你需要的情况（否则，一旦find间隔，随后的条件语句可以很容易地修改下面的内容）。我会注意到，最好的方式来执行这是与平分（我将提供第一 – 注意它不需要numpy和比使用numpy函数，因为他们执行冗余操作）。然后我会提供一个与其他用户在这里介绍的其他人的时间比较。

二分法：

 def bisection(array,value): '''Given an ``array`` , and given a ``value`` , returns an index j such that ``value`` is between array[j] and array[j+1]. ``array`` must be monotonic increasing. j=-1 or j=len(array) is returned to indicate that ``value`` is out of range below and above respectively.''' n = len(array) if (value < array[0]): return -1 elif (value > array[n-1]): return n jl = 0# Initialize lower ju = n-1# and upper limits. while (ju-jl > 1):# If we are not yet done, jm=(ju+jl) >> 1# compute a midpoint with a bitshift if (value >= array[jm]): jl=jm# and replace either the lower limit else: ju=jm# or the upper limit, as appropriate. # Repeat until the test condition is satisfied. if (value == array[0]):# edge cases at bottom return 0 elif (value == array[n-1]):# and top return n-1 else: return jl

现在我将定义来自其他答案的代码，它们都返回一个索引：

 import math import numpy as np def find_nearest1(array,value): idx,val = min(enumerate(array), key=lambda x: abs(x[1]-value)) return idx def find_nearest2(array, values): indices = np.abs(np.subtract.outer(array, values)).argmin(0) return indices def find_nearest3(array, values): values = np.atleast_1d(values) indices = np.abs(np.int64(np.subtract.outer(array, values))).argmin(0) out = array[indices] return indices def find_nearest4(array,value): idx = (np.abs(array-value)).argmin() return idx def find_nearest5(array, value): idx_sorted = np.argsort(array) sorted_array = np.array(array[idx_sorted]) idx = np.searchsorted(sorted_array, value, side="left") if idx >= len(array): idx_nearest = idx_sorted[len(array)-1] elif idx == 0: idx_nearest = idx_sorted[0] else: if abs(value - sorted_array[idx-1]) < abs(value - sorted_array[idx]): idx_nearest = idx_sorted[idx-1] else: idx_nearest = idx_sorted[idx] return idx_nearest def find_nearest6(array,value): xi = np.argmin(np.abs(np.ceil(array[None].T - value)),axis=0) return xi

现在我将计时代码：注意方法1,2,4,5不正确给出时间间隔。方法1,2,4到arrays中的最近点（例如> = 1.5 – > 2），方法5总是四舍五入（例如1.45 – > 2）。只有方法3和6，当然是平分给出了正确的时间间隔。

 array = np.arange(100000) val = array[50000]+0.55 print( bisection(array,val)) %timeit bisection(array,val) print( find_nearest1(array,val)) %timeit find_nearest1(array,val) print( find_nearest2(array,val)) %timeit find_nearest2(array,val) print( find_nearest3(array,val)) %timeit find_nearest3(array,val) print( find_nearest4(array,val)) %timeit find_nearest4(array,val) print( find_nearest5(array,val)) %timeit find_nearest5(array,val) print( find_nearest6(array,val)) %timeit find_nearest6(array,val) (50000, 50000) 100000 loops, best of 3: 4.4 µs per loop 50001 1 loop, best of 3: 180 ms per loop 50001 1000 loops, best of 3: 267 µs per loop [50000] 1000 loops, best of 3: 390 µs per loop 50001 1000 loops, best of 3: 259 µs per loop 50001 1000 loops, best of 3: 1.21 ms per loop [50000] 1000 loops, best of 3: 746 µs per loop

对于一个大的arrays，二分法给出了4us，相比之下最好的180us和最长的1.21ms（〜100-1000倍）。对于较小的arrays，速度要快2-100倍。

对于大型数组，由@Demitri给出的（优秀）答案比当前标记为最好的答案快得多。我已经用以下两种方法调整了他的确切algorithm：

下面的函数是否对input数组进行sorting。
下面的函数返回与最接近的值相对应的input数组的索引，这是比较一般的。

请注意，下面的函数还处理了一个特定的边界情况，这将导致@Demitri写入的原始函数中的错误。否则，我的algorithm与他的相同。

 def find_idx_nearest_val(array, value): idx_sorted = np.argsort(array) sorted_array = np.array(array[idx_sorted]) idx = np.searchsorted(sorted_array, value, side="left") if idx >= len(array): idx_nearest = idx_sorted[len(array)-1] elif idx == 0: idx_nearest = idx_sorted[0] else: if abs(value - sorted_array[idx-1]) < abs(value - sorted_array[idx]): idx_nearest = idx_sorted[idx-1] else: idx_nearest = idx_sorted[idx] return idx_nearest

我认为最pythonic的方式将是：

  num = 65 # Input number array = n.random.random((10))*100 # Given array nearest_idx = n.where(abs(array-num)==abs(array-num).min())[0] # If you want the index of the element of array (array) nearest to the the given number (num) nearest_val = array[abs(array-num)==abs(array-num).min()] # If you directly want the element of array (array) nearest to the given number (num)

这是基本的代码。如果你愿意，你可以使用它作为一个function

这里是@ Dimitri解决scheme的一个快速vector化版本，如果你有很多values要search（ values可以是multidimensional array）：

 #`values` should be sorted def get_closest(array, values): #make sure array is a numpy array array = np.array(array) # get insert positions idxs = np.searchsorted(array, values, side="left") # find indexes where previous index is closer prev_idx_is_less = ((idxs == len(array))|(np.fabs(values - array[np.maximum(idxs-1, 0)]) < np.fabs(values - array[np.minimum(idxs, len(array)-1)]))) idxs[prev_idx_is_less] -= 1 return array[idxs]

基准

比使用@ Demitri解决scheme的循环快100倍以上

 >>> %timeit ar=get_closest(np.linspace(1, 1000, 100), np.random.randint(0, 1050, (1000, 1000))) 139 ms ± 4.04 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) >>> %timeit ar=[find_nearest(np.linspace(1, 1000, 100), value) for value in np.random.randint(0, 1050, 1000*1000)] took 21.4 seconds

在numpy数组中find最接近的值

如何检查我正在使用哪个版本的Numpy？

使用LAPACK分配基于Cython的扩展

与副本的Numpy数组赋值

为什么在Numpy的0D数组不考虑标量？

欧几里德距离怎么能用numpy来计算？

Python中的主成分分析

如何在NumPy中规范化数组？

如何将numpy.linalg.norm应用到matrix的每一行？

查找最接近的值并返回Python中的数组索引

numpy – 数组vs asarray