vba:从数组中获取唯一值

有没有build立在VBAfunction从一维数组中获取唯一值? 那刚刚摆脱重复呢?

如果不是,那么我将如何从数组中获得唯一的值?

这篇文章包含2个例子。 我喜欢第二个:

Sub unique() Dim arr As New Collection, a Dim aFirstArray() As Variant Dim i As Long aFirstArray() = Array("Banana", "Apple", "Orange", "Tomato", "Apple", _ "Lemon", "Lime", "Lime", "Apple") On Error Resume Next For Each a In aFirstArray arr.Add a, a Next For i = 1 To arr.Count Cells(i, 1) = arr(i) Next End Sub 

没有内置function可以从arrays中删除重复项。 拉杰的答案看起来很优雅,但我更喜欢使用字典。

 Dim d As Object Set d = CreateObject("Scripting.Dictionary") 'Set d = New Scripting.Dictionary Dim i As Long For i = LBound(myArray) To UBound(myArray) d(myArray(i)) = 1 Next i Dim v As Variant For Each v In d.Keys() 'd.Keys() is a Variant array of the unique values in myArray. 'v will iterate through each of them. Next v 

编辑:我改变了循环使用LBoundUBound Tomalak的build议答案。 编辑: d.Keys()是一个变种数组,而不是一个集合。

更新(6/15/16)

我创build了更彻底的基准。 首先,正如@ChaimG所指出的那样,早期绑定会产生很大的不同(我最初使用的是@ eksortso的代码,它使用了较晚的绑定)。 其次,我的原始基准只包括创build独特对象的时间,但是,它没有testing使用对象的效率。 我的观点是,如果我创build的对象很笨重,并且让我放慢脚步,那么创build一个对象真的不是很重要。

旧备注: 事实certificate,循环集合对象的效率非常低

事实certificate,循环一个集合可以是非常有效的,如果你知道该怎么做(我没有)。 至于@ChaimG(再一次),在评论中指出,使用For Each构造比简单地使用For循环好得多。 为了给你一个想法,在更改循环构造之前, Test Case Size = 10^6 Collection2的时间超过了1400s(即〜23分钟)。 现在只有0.195s(超过7000倍)。

Collection方法有两次。 第一个(我的原始基准Collection1 )显示创build唯一对象的时间。 第二部分( Collection2 )显示了循环对象的时间(这是非常自然的),以创build一个可返回的数组,与其他函数一样。

在下面的图表中,黄色背景表示它是该testing用例中最快的,红色表示最慢(“未testing”algorithm被排除)。 Collection方法的总时间是Collection1Collection2的总和。 绿松石表明,无论原来的顺序是最快的。

Benchmarks5

下面是我创build的原始algorithm(我稍微修改它,例如我不再实例化我自己的数据types)。 它在一个非常可观的时间以原始顺序返回一个数组的唯一值,并且可以修改它以接受任何数据types。 在IndexMethod ,它是超大型数组最快的algorithm。

这里是这个algorithm的主要思想:

  1. 索引数组
  2. 按值sorting
  3. 将相同的值放在数组的末尾,然后将它们“砍”掉。
  4. 最后,按索引sorting。

下面是一个例子:

 Let myArray = (86, 100, 33, 19, 33, 703, 19, 100, 703, 19) 1. (86, 100, 33, 19, 33, 703, 19, 100, 703, 19) (1 , 2, 3, 4, 5, 6, 7, 8, 9, 10) <<-- Indexing 2. (19, 19, 19, 33, 33, 86, 100, 100, 703, 703) <<-- sort by values (4, 7, 10, 3, 5, 1, 2, 8, 6, 9) 3. (19, 33, 86, 100, 703) <<-- remove duplicates (4, 3, 1, 2, 6) 4. (86, 100, 33, 19, 703) ( 1, 2, 3, 4, 6) <<-- sort by index 

这里是代码:

 Function SortingUniqueTest(ByRef myArray() As Long, bOrigIndex As Boolean) As Variant Dim MyUniqueArr() As Long, i As Long, intInd As Integer Dim StrtTime As Double, Endtime As Double, HighB As Long, LowB As Long LowB = LBound(myArray): HighB = UBound(myArray) ReDim MyUniqueArr(1 To 2, LowB To HighB) intInd = 1 - LowB 'Guarantees the indices span 1 to Lim For i = LowB To HighB MyUniqueArr(1, i) = myArray(i) MyUniqueArr(2, i) = i + intInd Next i QSLong2D MyUniqueArr, 1, LBound(MyUniqueArr, 2), UBound(MyUniqueArr, 2), 2 Call UniqueArray2D(MyUniqueArr) If bOrigIndex Then QSLong2D MyUniqueArr, 2, LBound(MyUniqueArr, 2), UBound(MyUniqueArr, 2), 2 SortingUniqueTest = MyUniqueArr() End Function Public Sub UniqueArray2D(ByRef myArray() As Long) Dim i As Long, j As Long, Count As Long, Count1 As Long, DuplicateArr() As Long Dim lngTemp As Long, HighB As Long, LowB As Long LowB = LBound(myArray, 2): Count = LowB: i = LowB: HighB = UBound(myArray, 2) Do While i < HighB j = i + 1 If myArray(1, i) = myArray(1, j) Then Do While myArray(1, i) = myArray(1, j) ReDim Preserve DuplicateArr(1 To Count) DuplicateArr(Count) = j Count = Count + 1 j = j + 1 If j > HighB Then Exit Do Loop QSLong2D myArray, 2, i, j - 1, 2 End If i = j Loop Count1 = HighB If Count > 1 Then For i = UBound(DuplicateArr) To LBound(DuplicateArr) Step -1 myArray(1, DuplicateArr(i)) = myArray(1, Count1) myArray(2, DuplicateArr(i)) = myArray(2, Count1) Count1 = Count1 - 1 ReDim Preserve myArray(1 To 2, LowB To Count1) Next i End If End Sub 

这里是我使用的sortingalgorithm(更多关于这里的algorithm)。

 Sub QSLong2D(ByRef saArray() As Long, bytDim As Byte, lLow1 As Long, lHigh1 As Long, bytNum As Byte) Dim lLow2 As Long, lHigh2 As Long Dim sKey As Long, sSwap As Long, i As Byte On Error GoTo ErrorExit If IsMissing(lLow1) Then lLow1 = LBound(saArray, bytDim) If IsMissing(lHigh1) Then lHigh1 = UBound(saArray, bytDim) lLow2 = lLow1 lHigh2 = lHigh1 sKey = saArray(bytDim, (lLow1 + lHigh1) \ 2) Do While lLow2 < lHigh2 Do While saArray(bytDim, lLow2) < sKey And lLow2 < lHigh1: lLow2 = lLow2 + 1: Loop Do While saArray(bytDim, lHigh2) > sKey And lHigh2 > lLow1: lHigh2 = lHigh2 - 1: Loop If lLow2 < lHigh2 Then For i = 1 To bytNum sSwap = saArray(i, lLow2) saArray(i, lLow2) = saArray(i, lHigh2) saArray(i, lHigh2) = sSwap Next i End If If lLow2 <= lHigh2 Then lLow2 = lLow2 + 1 lHigh2 = lHigh2 - 1 End If Loop If lHigh2 > lLow1 Then QSLong2D saArray(), bytDim, lLow1, lHigh2, bytNum If lLow2 < lHigh1 Then QSLong2D saArray(), bytDim, lLow2, lHigh1, bytNum ErrorExit: End Sub 

如果你的数据包含整数,下面是一个特别的algorithm。 它利用索引和布尔数据types。

 Function IndexSort(ByRef myArray() As Long, bOrigIndex As Boolean) As Variant '' Modified to take both positive and negative integers Dim arrVals() As Long, arrSort() As Long, arrBool() As Boolean Dim i As Long, HighB As Long, myMax As Long, myMin As Long, OffSet As Long Dim LowB As Long, myIndex As Long, count As Long, myRange As Long HighB = UBound(myArray) LowB = LBound(myArray) For i = LowB To HighB If myArray(i) > myMax Then myMax = myArray(i) If myArray(i) < myMin Then myMin = myArray(i) Next i OffSet = Abs(myMin) '' Number that will be added to every element '' to guarantee every index is non-negative If myMax > 0 Then myRange = myMax + OffSet '' Eg if myMax = 10 & myMin = -2, then myRange = 12 Else myRange = OffSet End If If bOrigIndex Then ReDim arrSort(1 To 2, 1 To HighB) ReDim arrVals(1 To 2, 0 To myRange) ReDim arrBool(0 To myRange) For i = LowB To HighB myIndex = myArray(i) + OffSet arrBool(myIndex) = True arrVals(1, myIndex) = myArray(i) If arrVals(2, myIndex) = 0 Then arrVals(2, myIndex) = i Next i For i = 0 To myRange If arrBool(i) Then count = count + 1 arrSort(1, count) = arrVals(1, i) arrSort(2, count) = arrVals(2, i) End If Next i QSLong2D arrSort, 2, 1, count, 2 ReDim Preserve arrSort(1 To 2, 1 To count) Else ReDim arrSort(1 To HighB) ReDim arrVals(0 To myRange) ReDim arrBool(0 To myRange) For i = LowB To HighB myIndex = myArray(i) + OffSet arrBool(myIndex) = True arrVals(myIndex) = myArray(i) Next i For i = 0 To myRange If arrBool(i) Then count = count + 1 arrSort(count) = arrVals(i) End If Next i ReDim Preserve arrSort(1 To count) End If ReDim arrVals(0) ReDim arrBool(0) IndexSort = arrSort End Function 

这里是集合(由@DocBrown)和字典(由@eksortso)函数。

 Function CollectionTest(ByRef arrIn() As Long, Lim As Long) As Variant Dim arr As New Collection, a, i As Long, arrOut() As Variant, aFirstArray As Variant Dim StrtTime As Double, EndTime1 As Double, EndTime2 As Double, count As Long On Error Resume Next ReDim arrOut(1 To UBound(arrIn)) ReDim aFirstArray(1 To UBound(arrIn)) StrtTime = Timer For i = 1 To UBound(arrIn): aFirstArray(i) = CStr(arrIn(i)): Next i '' Convert to string For Each a In aFirstArray ''' This part is actually creating the unique set arr.Add a, a Next EndTime1 = Timer - StrtTime StrtTime = Timer ''' This part is writing back to an array for return For Each a In arr: count = count + 1: arrOut(count) = a: Next a EndTime2 = Timer - StrtTime CollectionTest = Array(arrOut, EndTime1, EndTime2) End Function Function DictionaryTest(ByRef myArray() As Long, Lim As Long) As Variant Dim StrtTime As Double, Endtime As Double Dim d As Scripting.Dictionary, i As Long '' Early Binding Set d = New Scripting.Dictionary For i = LBound(myArray) To UBound(myArray): d(myArray(i)) = 1: Next i DictionaryTest = d.Keys() End Function 

这是@IsraelHoletz提供的直接方法。

 Function ArrayUnique(ByRef aArrayIn() As Long) As Variant Dim aArrayOut() As Variant, bFlag As Boolean, vIn As Variant, vOut As Variant Dim i As Long, j As Long, k As Long ReDim aArrayOut(LBound(aArrayIn) To UBound(aArrayIn)) i = LBound(aArrayIn) j = i For Each vIn In aArrayIn For k = j To i - 1 If vIn = aArrayOut(k) Then bFlag = True: Exit For Next If Not bFlag Then aArrayOut(i) = vIn: i = i + 1 bFlag = False Next If i <> UBound(aArrayIn) Then ReDim Preserve aArrayOut(LBound(aArrayIn) To i - 1) ArrayUnique = aArrayOut End Function Function DirectTest(ByRef aArray() As Long, Lim As Long) As Variant Dim aReturn() As Variant Dim StrtTime As Long, Endtime As Long, i As Long aReturn = ArrayUnique(aArray) DirectTest = aReturn End Function 

这是比较所有function的基准函数。 你应该注意到,由于内存问题,最后两种情况处理有点不同。 另外请注意,我没有testingTest Case Size = 10,000,000Collection方法。 出于某种原因,它返回了不正确的结果,performance不寻常(我猜集合对象有多less东西可以放在它的限制,我search,我找不到任何文学)。

 Function UltimateTest(Lim As Long, bTestDirect As Boolean, bTestDictionary, bytCase As Byte) As Variant Dim dictionTest, collectTest, sortingTest1, indexTest1, directT '' all variants Dim arrTest() As Long, i As Long, bEquality As Boolean, SizeUnique As Long Dim myArray() As Long, StrtTime As Double, EndTime1 As Variant Dim EndTime2 As Double, EndTime3 As Variant, EndTime4 As Double Dim EndTime5 As Double, EndTime6 As Double, sortingTest2, indexTest2 ReDim myArray(1 To Lim): Rnd (-2) '' If you want to test negative numbers, '' insert this to the left of CLng(Int(Lim... : (-1) ^ (Int(2 * Rnd())) * For i = LBound(myArray) To UBound(myArray): myArray(i) = CLng(Int(Lim * Rnd() + 1)): Next i arrTest = myArray If bytCase = 1 Then If bTestDictionary Then StrtTime = Timer: dictionTest = DictionaryTest(arrTest, Lim): EndTime1 = Timer - StrtTime Else EndTime1 = "Not Tested" End If arrTest = myArray collectTest = CollectionTest(arrTest, Lim) arrTest = myArray StrtTime = Timer: sortingTest1 = SortingUniqueTest(arrTest, True): EndTime2 = Timer - StrtTime SizeUnique = UBound(sortingTest1, 2) If bTestDirect Then arrTest = myArray: StrtTime = Timer: directT = DirectTest(arrTest, Lim): EndTime3 = Timer - StrtTime Else EndTime3 = "Not Tested" End If arrTest = myArray StrtTime = Timer: indexTest1 = IndexSort(arrTest, True): EndTime4 = Timer - StrtTime arrTest = myArray StrtTime = Timer: sortingTest2 = SortingUniqueTest(arrTest, False): EndTime5 = Timer - StrtTime arrTest = myArray StrtTime = Timer: indexTest2 = IndexSort(arrTest, False): EndTime6 = Timer - StrtTime bEquality = True For i = LBound(sortingTest1, 2) To UBound(sortingTest1, 2) If Not CLng(collectTest(0)(i)) = sortingTest1(1, i) Then bEquality = False Exit For End If Next i For i = LBound(dictionTest) To UBound(dictionTest) If Not dictionTest(i) = sortingTest1(1, i + 1) Then bEquality = False Exit For End If Next i For i = LBound(dictionTest) To UBound(dictionTest) If Not dictionTest(i) = indexTest1(1, i + 1) Then bEquality = False Exit For End If Next i If bTestDirect Then For i = LBound(dictionTest) To UBound(dictionTest) If Not dictionTest(i) = directT(i + 1) Then bEquality = False Exit For End If Next i End If UltimateTest = Array(bEquality, EndTime1, EndTime2, EndTime3, EndTime4, _ EndTime5, EndTime6, collectTest(1), collectTest(2), SizeUnique) ElseIf bytCase = 2 Then arrTest = myArray collectTest = CollectionTest(arrTest, Lim) UltimateTest = Array(collectTest(1), collectTest(2)) ElseIf bytCase = 3 Then arrTest = myArray StrtTime = Timer: sortingTest1 = SortingUniqueTest(arrTest, True): EndTime2 = Timer - StrtTime SizeUnique = UBound(sortingTest1, 2) UltimateTest = Array(EndTime2, SizeUnique) ElseIf bytCase = 4 Then arrTest = myArray StrtTime = Timer: indexTest1 = IndexSort(arrTest, True): EndTime4 = Timer - StrtTime UltimateTest = EndTime4 ElseIf bytCase = 5 Then arrTest = myArray StrtTime = Timer: sortingTest2 = SortingUniqueTest(arrTest, False): EndTime5 = Timer - StrtTime UltimateTest = EndTime5 ElseIf bytCase = 6 Then arrTest = myArray StrtTime = Timer: indexTest2 = IndexSort(arrTest, False): EndTime6 = Timer - StrtTime UltimateTest = EndTime6 End If End Function 

最后,这里是生成上面的表的子。

 Sub GetBenchmarks() Dim myVar, i As Long, TestCases As Variant, j As Long, temp TestCases = Array(1000, 5000, 10000, 20000, 50000, 100000, 200000, 500000, 1000000, 2000000, 5000000, 10000000) For j = 0 To 11 If j < 6 Then myVar = UltimateTest(CLng(TestCases(j)), True, True, 1) ElseIf j < 10 Then myVar = UltimateTest(CLng(TestCases(j)), False, True, 1) ElseIf j < 11 Then myVar = Array("Not Tested", "Not Tested", 0.1, "Not Tested", 0.1, 0.1, 0.1, 0, 0, 0) temp = UltimateTest(CLng(TestCases(j)), False, False, 2) myVar(7) = temp(0): myVar(8) = temp(1) temp = UltimateTest(CLng(TestCases(j)), False, False, 3) myVar(2) = temp(0): myVar(9) = temp(1) myVar(4) = UltimateTest(CLng(TestCases(j)), False, False, 4) myVar(5) = UltimateTest(CLng(TestCases(j)), False, False, 5) myVar(6) = UltimateTest(CLng(TestCases(j)), False, False, 6) Else myVar = Array("Not Tested", "Not Tested", 0.1, "Not Tested", 0.1, 0.1, 0.1, "Not Tested", "Not Tested", 0) temp = UltimateTest(CLng(TestCases(j)), False, False, 3) myVar(2) = temp(0): myVar(9) = temp(1) myVar(4) = UltimateTest(CLng(TestCases(j)), False, False, 4) myVar(5) = UltimateTest(CLng(TestCases(j)), False, False, 5) myVar(6) = UltimateTest(CLng(TestCases(j)), False, False, 6) End If Cells(4 + j, 6) = TestCases(j) For i = 1 To 9: Cells(4 + j, 6 + i) = myVar(i - 1): Next i Cells(4 + j, 17) = myVar(9) Next j End Sub 

概要
从结果表中可以看出, Dictionary方法在小于50万的情况下效果很好,然而,在这之后, IndexMethod真的开始占据主导地位。 你会注意到,当顺序不重要,你的数据是由正整数组成时,没有与IndexMethodalgorithm进行比较(它在不到1秒的时间内从包含1000万个元素的数组中返回唯一值!!! Incredible !)。 下面我会详细介绍哪种algorithm在各种情况下都是首选。

情况1
您的数据包含整数(即整数,包括正数和负数): IndexMethod

案例2
您的数据包含less于200000个元素的非整数(即变体,双精度,string等): Dictionary Method

案例3
您的数据包含超过200000个元素的非整数(即变体,双精度,string等): Collection Method

如果你不得不select一种algorithm,在我看来, Collection方法仍然是最好的,因为它只需要几行代码,它是超级通用的,而且速度足够快。

不,没有内置的。 自己做:

  • 实例化一个Scripting.Dictionary对象
  • 在你的数组上写一个For循环(确保使用LBound()UBound()而不是从0循环到x!)
  • 在每次迭代中,检查字典上的Exists() 。 添加每个数组值(不存在的)作为字典的键( 使用CStr()因为键必须是string,因为我刚学过,键可以是Scripting.Dictionary的任何types),也可以存储数组值本身放入字典中。
  • 完成后,使用Keys() (或Items() )将字典的所有值作为新的,现在唯一的数组返回。
  • 在我的testing中,字典保持所有附加值的原始顺序,所以输出将像input一样被sorting。 不过,我不确定这是否是有logging的,可靠的行为。

我不知道VBA中的任何内置function。 最好的办法是使用一个集合作为键的值,只有当一个值不存在时才添加。

不,VBA没有这个function。 您可以使用将项目作为关键字将每个项目添加到集合的技术。 由于集合不允许重复键,所以如果需要,结果是可以复制到数组的不同值。

你可能还想要更强大的东西。 请参阅http://www.cpearson.com/excel/distinctvalues.aspx上的; Distinct Values Function

不同的值函数

一个VBA函数,它将返回input值范围或数组中不同值的数组。

Excel有一些手动方法,如高级filter,用于从input范围获取不同项目的列表。 使用这种方法的缺点是,当input数据改变时,你必须手动刷新结果。 此外,这些方法仅适用于范围,而不是数组的数组,而不是函数,不能从工作表单元中调用或者合并到数组公式中。 这个页面描述了一个名为DistinctValues的VBA函数,该函数接受一个范围或一个数据数组作为input,并返回一个包含input列表中不同项的数组。 也就是说,删除了所有重复的元素。 input元素的顺序被保留。 输出数组中元素的顺序与input值中的顺序相同。 可以从工作表上input的数组中调用函数(有关数组公式的信息,请参见此页),也可以从单个工作表单元格中的数组公式中或从另一个VB函数中调用该函数。

收集和字典解决scheme都很好,闪耀的一个简短的方法,但如果你想速度尝试使用更直接的方法:

 Function ArrayUnique(ByVal aArrayIn As Variant) As Variant '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' ' ArrayUnique ' This function removes duplicated values from a single dimension array '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' Dim aArrayOut() As Variant Dim bFlag As Boolean Dim vIn As Variant Dim vOut As Variant Dim i%, j%, k% ReDim aArrayOut(LBound(aArrayIn) To UBound(aArrayIn)) i = LBound(aArrayIn) j = i For Each vIn In aArrayIn For k = j To i - 1 If vIn = aArrayOut(k) Then bFlag = True: Exit For Next If Not bFlag Then aArrayOut(i) = vIn: i = i + 1 bFlag = False Next If i <> UBound(aArrayIn) Then ReDim Preserve aArrayOut(LBound(aArrayIn) To i - 1) ArrayUnique = aArrayOut End Function 

调用它:

 Sub Test() Dim aReturn As Variant Dim aArray As Variant aArray = Array(1, 2, 3, 1, 2, 3, "Test", "Test") aReturn = ArrayUnique(aArray) End Sub 

对于速度比较来说,字典解决scheme的速度要快100倍到130倍,比收集速度快8000倍到13000倍。

我对VBA很新。 但是,当我正在寻找完全相同的解决scheme时,我想要一种循环方式,而不必在另一个数组中指定关键元素。 所以我写了下面的代码,它的工作原理和简短。 希望这可以帮助!

标题是我的代码中的一维数组

 For i = UBound(titles) To LBound(titles) + 1 Step -1 'Looping backwards through the array If titles(i) = titles(i - 1) Then 'If the last element is the same as the one before it ReDim Preserve titles(i - 1) 'Then trim it down by one. Essentially, delete it from the array End If Next i