什么是最有效的方法来删除重复和sorting向量?

我需要带有潜在的很多元素的C ++向量,擦除重复项,并对其进行分类。

我目前有下面的代码,但它不起作用。

vec.erase( std::unique(vec.begin(), vec.end()), vec.end()); std::sort(vec.begin(), vec.end()); 

我怎样才能正确地做到这一点?

另外,首先删除重复项(类似于上面的代码)还是先执行sorting? 如果我首先执行sorting,它是否保证在执行std::unique之后保持sorting?

还是有另一种(也许更有效)的方式来做到这一点?

我同意R. Pate和Todd Gardner ; 一个std::set在这里可能是个好主意。 即使你使用vector卡,如果你有足够的重复,你可能会更好地创build一个集来做肮脏的工作。

我们来比较三种方法:

只要使用vector,sorting+独特

 sort( vec.begin(), vec.end() ); vec.erase( unique( vec.begin(), vec.end() ), vec.end() ); 

转换为设置(手动)

 set<int> s; unsigned size = vec.size(); for( unsigned i = 0; i < size; ++i ) s.insert( vec[i] ); vec.assign( s.begin(), s.end() ); 

转换为set(使用构造函数)

 set<int> s( vec.begin(), vec.end() ); vec.assign( s.begin(), s.end() ); 

以下是重复次数变化的情况:

矢量和集合方法的比较

总结 :当重复次数足够大时, 转换为一个集合然后将数据转储回向量实际上更快

出于某种原因,手动进行设置转换似乎比使用set构造函数要快 – 至less在我使用的玩具随机数据上。

std::unique只消除重复的元素,如果它们是邻居的:你必须先sortingvector,然后才能按照你的意图工作。

std::unique被定义为稳定的,所以在运行unique之后,vector仍然会被sorting。

我不知道你在用什么,所以我不能100%肯定地说,但通常当我想“sorting,唯一”的容器,我想一个std ::集 。 这可能会更适合您的用例:

 std::set<Foo> foos(vec.begin(), vec.end()); // both sorted & unique already 

否则,在调用独特之前的sorting(正如其他答案指出的)是要走的路。

我重新Nate Kohl的分析,并得到不同的结果。 对于我的testing用例,直接对vector进行sorting总是比使用一个集合更高效。 我添加了一个新的更高效的方法,使用unordered_set

请记住, unordered_set方法只适用于需要分类和sorting的types的散列函数。 对于整数,这很容易! (标准的库提供了一个默认的散列,它只是标识函数。)另外,不要忘记sorting在最后,因为unordered_set是,无序:)

我在setunordered_set实现中进行了一些挖掘,发现构造函数实际上为每个元素构造一个新的节点,然后检查它的值以确定它是否应该实际插入(至less在Visual Studio实现中)。

以下是5种方法:

f1:只使用vectorsort + unique

 sort( vec.begin(), vec.end() ); vec.erase( unique( vec.begin(), vec.end() ), vec.end() ); 

f2:转换为set (使用构造函数)

 set<int> s( vec.begin(), vec.end() ); vec.assign( s.begin(), s.end() ); 

f3:转换为set (手动)

 set<int> s; for (int i : vec) s.insert(i); vec.assign( s.begin(), s.end() ); 

f4:转换为unordered_set (使用构造函数)

 unordered_set<int> s( vec.begin(), vec.end() ); vec.assign( s.begin(), s.end() ); sort( vec.begin(), vec.end() ); 

f5:转换为unordered_set (手动)

 unordered_set<int> s; for (int i : vec) s.insert(i); vec.assign( s.begin(), s.end() ); sort( vec.begin(), vec.end() ); 

我做了一个随机select的范围[1,10],[1,1000]和[1,100000]

结果(以秒为单位,越小越好):

 range f1 f2 f3 f4 f5 [1,10] 1.6821 7.6804 2.8232 6.2634 0.7980 [1,1000] 5.0773 13.3658 8.2235 7.6884 1.9861 [1,100000] 8.7955 32.1148 26.5485 13.3278 3.9822 

std::unique只适用于重复元素的连续运行,所以你最好先sorting。 但是,它是稳定的,所以你的向量将保持sorting。

这里有一个模板可以帮你:

 template<typename T> void removeDuplicates(std::vector<T>& vec) { std::sort(vec.begin(), vec.end()); vec.erase(std::unique(vec.begin(), vec.end()), vec.end()); } 

这样称呼:

 removeDuplicates<int>(vectorname); 

您需要在调用unique之前对其进行sorting,因为unique只会删除彼此相邻的重复项。

编辑:38秒…

unique只是删除连续的重复元素(这是线性时间运行所必需的),所以您应该首先执行sorting。 它会保持sorting后的unique

效率是一个复杂的概念。 有时间和空间的考虑,以及一般的测量(只有O(n)模糊的答案)与特定的测量(例如气泡sorting比快速sorting快得多,取决于input特性)。

如果你有相对较less的重复,然后sorting然后擦除似乎是唯一的方式去。 如果你有相对较多的副本,从vector创build一个集合,让它做繁重的工作很容易打败它。

不要只专注于时间效率。 sorting+唯一性+擦除在O(1)空间中运行,而设置结构在O(n)空间中运行。 既不直接适用于映射 – 减less并行(对于真正庞大的数据集)。

如果你不想改变元素的顺序,那么你可以试试这个解决scheme:

 template <class T> void RemoveDuplicatesInVector(std::vector<T> & vec) { set<T> values; vec.erase(std::remove_if(vec.begin(), vec.end(), [&](const T & value) { return !values.insert(value).second; }), vec.end()); } 

Nate Kohlbuild议的标准方法,只使用vector,sorting+独特:

 sort( vec.begin(), vec.end() ); vec.erase( unique( vec.begin(), vec.end() ), vec.end() ); 

不适用于指针向量。

请仔细看看cplusplus.com上的这个例子 。

在他们的例子中,“所谓的重复”移动到最后实际上显示为? (未定义的值),因为那些“所谓的重复”是有时候“额外的元素”,而有时则是在原始向量中有“缺失的元素”。

在指向对象的指针向量(内存泄漏,HEAP数据读取错误,重复释放,导致段错误等)上使用std::unique()时会发生问题。

这是我的问题的解决scheme:用ptgi::unique()replacestd::unique() ptgi::unique()

请参阅下面的文件ptgi_unique.hpp:

 // ptgi::unique() // // Fix a problem in std::unique(), such that none of the original elts in the collection are lost or duplicate. // ptgi::unique() has the same interface as std::unique() // // There is the 2 argument version which calls the default operator== to compare elements. // // There is the 3 argument version, which you can pass a user defined functor for specialized comparison. // // ptgi::unique() is an improved version of std::unique() which doesn't looose any of the original data // in the collection, nor does it create duplicates. // // After ptgi::unique(), every old element in the original collection is still present in the re-ordered collection, // except that duplicates have been moved to a contiguous range [dupPosition, last) at the end. // // Thus on output: // [begin, dupPosition) range are unique elements. // [dupPosition, last) range are duplicates which can be removed. // where: // [] means inclusive, and // () means exclusive. // // In the original std::unique() non-duplicates at end are moved downward toward beginning. // In the improved ptgi:unique(), non-duplicates at end are swapped with duplicates near beginning. // // In addition if you have a collection of ptrs to objects, the regular std::unique() will loose memory, // and can possibly delete the same pointer multiple times (leading to SEGMENTATION VIOLATION on Linux machines) // but ptgi::unique() won't. Use valgrind(1) to find such memory leak problems!!! // // NOTE: IF you have a vector of pointers, that is, std::vector<Object*>, then upon return from ptgi::unique() // you would normally do the following to get rid of the duplicate objects in the HEAP: // // // delete objects from HEAP // std::vector<Object*> objects; // for (iter = dupPosition; iter != objects.end(); ++iter) // { // delete (*iter); // } // // // shrink the vector. But Object * pointers are NOT followed for duplicate deletes, this shrinks the vector.size()) // objects.erase(dupPosition, objects.end)); // // NOTE: But if you have a vector of objects, that is: std::vector<Object>, then upon return from ptgi::unique(), it // suffices to just call vector:erase(, as erase will automatically call delete on each object in the // [dupPosition, end) range for you: // // std::vector<Object> objects; // objects.erase(dupPosition, last); // //========================================================================================================== // Example of differences between std::unique() vs ptgi::unique(). // // Given: // int data[] = {10, 11, 21}; // // Given this functor: ArrayOfIntegersEqualByTen: // A functor which compares two integers a[i] and a[j] in an int a[] array, after division by 10: // // // given an int data[] array, remove consecutive duplicates from it. // // functor used for std::unique (BUGGY) or ptgi::unique(IMPROVED) // // // Two numbers equal if, when divided by 10 (integer division), the quotients are the same. // // Hence 50..59 are equal, 60..69 are equal, etc. // struct ArrayOfIntegersEqualByTen: public std::equal_to<int> // { // bool operator() (const int& arg1, const int& arg2) const // { // return ((arg1/10) == (arg2/10)); // } // }; // // Now, if we call (problematic) std::unique( data, data+3, ArrayOfIntegersEqualByTen() ); // // TEST1: BEFORE UNIQ: 10,11,21 // TEST1: AFTER UNIQ: 10,21,21 // DUP_INX=2 // // PROBLEM: 11 is lost, and extra 21 has been added. // // More complicated example: // // TEST2: BEFORE UNIQ: 10,20,21,22,30,31,23,24,11 // TEST2: AFTER UNIQ: 10,20,30,23,11,31,23,24,11 // DUP_INX=5 // // Problem: 21 and 22 are deleted. // Problem: 11 and 23 are duplicated. // // // NOW if ptgi::unique is called instead of std::unique, both problems go away: // // DEBUG: TEST1: NEW_WAY=1 // TEST1: BEFORE UNIQ: 10,11,21 // TEST1: AFTER UNIQ: 10,21,11 // DUP_INX=2 // // DEBUG: TEST2: NEW_WAY=1 // TEST2: BEFORE UNIQ: 10,20,21,22,30,31,23,24,11 // TEST2: AFTER UNIQ: 10,20,30,23,11,31,22,24,21 // DUP_INX=5 // // @SEE: look at the "case study" below to understand which the last "AFTER UNIQ" results with that order: // TEST2: AFTER UNIQ: 10,20,30,23,11,31,22,24,21 // //========================================================================================================== // Case Study: how ptgi::unique() works: // Remember we "remove adjacent duplicates". // In this example, the input is NOT fully sorted when ptgi:unique() is called. // // I put | separatators, BEFORE UNIQ to illustrate this // 10 | 20,21,22 | 30,31 | 23,24 | 11 // // In example above, 20, 21, 22 are "same" since dividing by 10 gives 2 quotient. // And 30,31 are "same", since /10 quotient is 3. // And 23, 24 are same, since /10 quotient is 2. // And 11 is "group of one" by itself. // So there are 5 groups, but the 4th group (23, 24) happens to be equal to group 2 (20, 21, 22) // So there are 5 groups, and the 5th group (11) is equal to group 1 (10) // // R = result // F = first // // 10, 20, 21, 22, 30, 31, 23, 24, 11 // RF // // 10 is result, and first points to 20, and R != F (10 != 20) so bump R: // R // F // // Now we hits the "optimized out swap logic". // (avoid swap because R == F) // // // now bump F until R != F (integer division by 10) // 10, 20, 21, 22, 30, 31, 23, 24, 11 // RF // 20 == 21 in 10x // RF // 20 == 22 in 10x // RF // 20 != 30, so we do a swap of ++R and F // (Now first hits 21, 22, then finally 30, which is different than R, so we swap bump R to 21 and swap with 30) // 10, 20, 30, 22, 21, 31, 23, 24, 11 // after R & F swap (21 and 30) // RF // // 10, 20, 30, 22, 21, 31, 23, 24, 11 // RF // bump F to 31, but R and F are same (30 vs 31) // RF // bump F to 23, R != F, so swap ++R with F // 10, 20, 30, 22, 21, 31, 23, 24, 11 // RF // bump R to 22 // 10, 20, 30, 23, 21, 31, 22, 24, 11 // after the R & F swap (22 & 23 swap) // RF // will swap 22 and 23 // RF // bump F to 24, but R and F are same in 10x // RF // bump F, R != F, so swap ++R with F // RF // R and F are diff, so swap ++R with F (21 and 11) // 10, 20, 30, 23, 11, 31, 22, 24, 21 // RF // aftter swap of old 21 and 11 // RF // F now at last(), so loop terminates // RF // bump R by 1 to point to dupPostion (first duplicate in range) // // return R which now points to 31 //========================================================================================================== // NOTES: // 1) the #ifdef IMPROVED_STD_UNIQUE_ALGORITHM documents how we have modified the original std::unique(). // 2) I've heavily unit tested this code, including using valgrind(1), and it is *believed* to be 100% defect-free. // //========================================================================================================== // History: // 130201 dpb dbednar@ptgi.com created //========================================================================================================== #ifndef PTGI_UNIQUE_HPP #define PTGI_UNIQUE_HPP // Created to solve memory leak problems when calling std::unique() on a vector<Route*>. // Memory leaks discovered with valgrind and unitTesting. #include <algorithm> // std::swap // instead of std::myUnique, call this instead, where arg3 is a function ptr // // like std::unique, it puts the dups at the end, but it uses swapping to preserve original // vector contents, to avoid memory leaks and duplicate pointers in vector<Object*>. #ifdef IMPROVED_STD_UNIQUE_ALGORITHM #error the #ifdef for IMPROVED_STD_UNIQUE_ALGORITHM was defined previously.. Something is wrong. #endif #undef IMPROVED_STD_UNIQUE_ALGORITHM #define IMPROVED_STD_UNIQUE_ALGORITHM // similar to std::unique, except that this version swaps elements, to avoid // memory leaks, when vector contains pointers. // // Normally the input is sorted. // Normal std::unique: // 10 20 20 20 30 30 20 20 10 // abcdefghi // // 10 20 30 20 10 | 30 20 20 10 // abegifghi // // Now GONE: c, d. // Now DUPS: g, i. // This causes memory leaks and segmenation faults due to duplicate deletes of same pointer! namespace ptgi { // Return the position of the first in range of duplicates moved to end of vector. // // uses operator== of class for comparison // // @param [first, last) is a range to find duplicates within. // // @return the dupPosition position, such that [dupPosition, end) are contiguous // duplicate elements. // IF all items are unique, then it would return last. // template <class ForwardIterator> ForwardIterator unique( ForwardIterator first, ForwardIterator last) { // compare iterators, not values if (first == last) return last; // remember the current item that we are looking at for uniqueness ForwardIterator result = first; // result is slow ptr where to store next unique item // first is fast ptr which is looking at all elts // the first iterator moves over all elements [begin+1, end). // while the current item (result) is the same as all elts // to the right, (first) keeps going, until you find a different // element pointed to by *first. At that time, we swap them. while (++first != last) { if (!(*result == *first)) { #ifdef IMPROVED_STD_UNIQUE_ALGORITHM // inc result, then swap *result and *first // THIS IS WHAT WE WANT TO DO. // BUT THIS COULD SWAP AN ELEMENT WITH ITSELF, UNCECESSARILY!!! // std::swap( *first, *(++result)); // BUT avoid swapping with itself when both iterators are the same ++result; if (result != first) std::swap( *first, *result); #else // original code found in std::unique() // copies unique down *(++result) = *first; #endif } } return ++result; } template <class ForwardIterator, class BinaryPredicate> ForwardIterator unique( ForwardIterator first, ForwardIterator last, BinaryPredicate pred) { if (first == last) return last; // remember the current item that we are looking at for uniqueness ForwardIterator result = first; while (++first != last) { if (!pred(*result,*first)) { #ifdef IMPROVED_STD_UNIQUE_ALGORITHM // inc result, then swap *result and *first // THIS COULD SWAP WITH ITSELF UNCECESSARILY // std::swap( *first, *(++result)); // // BUT avoid swapping with itself when both iterators are the same ++result; if (result != first) std::swap( *first, *result); #else // original code found in std::unique() // copies unique down // causes memory leaks, and duplicate ptrs // and uncessarily moves in place! *(++result) = *first; #endif } } return ++result; } // from now on, the #define is no longer needed, so get rid of it #undef IMPROVED_STD_UNIQUE_ALGORITHM } // end ptgi:: namespace #endif 

下面是我用来testing它的UNITtesting程序:

 // QUESTION: in test2, I had trouble getting one line to compile,which was caused by the declaration of operator() // in the equal_to Predicate. I'm not sure how to correctly resolve that issue. // Look for //OUT lines // // Make sure that NOTES in ptgi_unique.hpp are correct, in how we should "cleanup" duplicates // from both a vector<Integer> (test1()) and vector<Integer*> (test2). // Run this with valgrind(1). // // In test2(), IF we use the call to std::unique(), we get this problem: // // [dbednar@ipeng8 TestSortRoutes]$ ./Main7 // TEST2: ORIG nums before UNIQUE: 10, 20, 21, 22, 30, 31, 23, 24, 11 // TEST2: modified nums AFTER UNIQUE: 10, 20, 30, 23, 11, 31, 23, 24, 11 // INFO: dupInx=5 // TEST2: uniq = 10 // TEST2: uniq = 20 // TEST2: uniq = 30 // TEST2: uniq = 33427744 // TEST2: uniq = 33427808 // Segmentation fault (core dumped) // // And if we run valgrind we seen various error about "read errors", "mismatched free", "definitely lost", etc. // // valgrind --leak-check=full ./Main7 // ==359== Memcheck, a memory error detector // ==359== Command: ./Main7 // ==359== Invalid read of size 4 // ==359== Invalid free() / delete / delete[] // ==359== HEAP SUMMARY: // ==359== in use at exit: 8 bytes in 2 blocks // ==359== LEAK SUMMARY: // ==359== definitely lost: 8 bytes in 2 blocks // But once we replace the call in test2() to use ptgi::unique(), all valgrind() error messages disappear. // // 130212 dpb dbednar@ptgi.com created // ========================================================================================================= #include <iostream> // std::cout, std::cerr #include <string> #include <vector> // std::vector #include <sstream> // std::ostringstream #include <algorithm> // std::unique() #include <functional> // std::equal_to(), std::binary_function() #include <cassert> // assert() MACRO #include "ptgi_unique.hpp" // ptgi::unique() // Integer is small "wrapper class" around a primitive int. // There is no SETTER, so Integer's are IMMUTABLE, just like in JAVA. class Integer { private: int num; public: // default CTOR: "Integer zero;" // COMPRENSIVE CTOR: "Integer five(5);" Integer( int num = 0 ) : num(num) { } // COPY CTOR Integer( const Integer& rhs) : num(rhs.num) { } // assignment, operator=, needs nothing special... since all data members are primitives // GETTER for 'num' data member // GETTER' are *always* const int getNum() const { return num; } // NO SETTER, because IMMUTABLE (similar to Java's Integer class) // @return "num" // NB: toString() should *always* be a const method // // NOTE: it is probably more efficient to call getNum() intead // of toString() when printing a number: // // BETTER to do this: // Integer five(5); // std::cout << five.getNum() << "\n" // than this: // std::cout << five.toString() << "\n" std::string toString() const { std::ostringstream oss; oss << num; return oss.str(); } }; // convenience typedef's for iterating over std::vector<Integer> typedef std::vector<Integer>::iterator IntegerVectorIterator; typedef std::vector<Integer>::const_iterator ConstIntegerVectorIterator; // convenience typedef's for iterating over std::vector<Integer*> typedef std::vector<Integer*>::iterator IntegerStarVectorIterator; typedef std::vector<Integer*>::const_iterator ConstIntegerStarVectorIterator; // functor used for std::unique or ptgi::unique() on a std::vector<Integer> // Two numbers equal if, when divided by 10 (integer division), the quotients are the same. // Hence 50..59 are equal, 60..69 are equal, etc. struct IntegerEqualByTen: public std::equal_to<Integer> { bool operator() (const Integer& arg1, const Integer& arg2) const { return ((arg1.getNum()/10) == (arg2.getNum()/10)); } }; // functor used for std::unique or ptgi::unique on a std::vector<Integer*> // Two numbers equal if, when divided by 10 (integer division), the quotients are the same. // Hence 50..59 are equal, 60..69 are equal, etc. struct IntegerEqualByTenPointer: public std::equal_to<Integer*> { // NB: the Integer*& looks funny to me! // TECHNICAL PROBLEM ELSEWHERE so had to remove the & from *& //OUT bool operator() (const Integer*& arg1, const Integer*& arg2) const // bool operator() (const Integer* arg1, const Integer* arg2) const { return ((arg1->getNum()/10) == (arg2->getNum()/10)); } }; void test1(); void test2(); void printIntegerStarVector( const std::string& msg, const std::vector<Integer*>& nums ); int main() { test1(); test2(); return 0; } // test1() uses a vector<Object> (namely vector<Integer>), so there is no problem with memory loss void test1() { int data[] = { 10, 20, 21, 22, 30, 31, 23, 24, 11}; // turn C array into C++ vector std::vector<Integer> nums(data, data+9); // arg3 is a functor IntegerVectorIterator dupPosition = ptgi::unique( nums.begin(), nums.end(), IntegerEqualByTen() ); nums.erase(dupPosition, nums.end()); nums.erase(nums.begin(), dupPosition); } //================================================================================== // test2() uses a vector<Integer*>, so after ptgi:unique(), we have to be careful in // how we eliminate the duplicate Integer objects stored in the heap. //================================================================================== void test2() { int data[] = { 10, 20, 21, 22, 30, 31, 23, 24, 11}; // turn C array into C++ vector of Integer* pointers std::vector<Integer*> nums; // put data[] integers into equivalent Integer* objects in HEAP for (int inx = 0; inx < 9; ++inx) { nums.push_back( new Integer(data[inx]) ); } // print the vector<Integer*> to stdout printIntegerStarVector( "TEST2: ORIG nums before UNIQUE", nums ); // arg3 is a functor #if 1 // corrected version which fixes SEGMENTATION FAULT and all memory leaks reported by valgrind(1) // I THINK we want to use new C++11 cbegin() and cend(),since the equal_to predicate is passed "Integer *&" // DID NOT COMPILE //OUT IntegerStarVectorIterator dupPosition = ptgi::unique( const_cast<ConstIntegerStarVectorIterator>(nums.begin()), const_cast<ConstIntegerStarVectorIterator>(nums.end()), IntegerEqualByTenPointer() ); // DID NOT COMPILE when equal_to predicate declared "Integer*& arg1, Integer*& arg2" //OUT IntegerStarVectorIterator dupPosition = ptgi::unique( const_cast<nums::const_iterator>(nums.begin()), const_cast<nums::const_iterator>(nums.end()), IntegerEqualByTenPointer() ); // okay when equal_to predicate declared "Integer* arg1, Integer* arg2" IntegerStarVectorIterator dupPosition = ptgi::unique(nums.begin(), nums.end(), IntegerEqualByTenPointer() ); #else // BUGGY version that causes SEGMENTATION FAULT and valgrind(1) errors IntegerStarVectorIterator dupPosition = std::unique( nums.begin(), nums.end(), IntegerEqualByTenPointer() ); #endif printIntegerStarVector( "TEST2: modified nums AFTER UNIQUE", nums ); int dupInx = dupPosition - nums.begin(); std::cout << "INFO: dupInx=" << dupInx <<"\n"; // delete the dup Integer* objects in the [dupPosition, end] range for (IntegerStarVectorIterator iter = dupPosition; iter != nums.end(); ++iter) { delete (*iter); } // shrink the vector // NB: the Integer* ptrs are NOT followed by vector::erase() nums.erase(dupPosition, nums.end()); // print the uniques, by following the iter to the Integer* pointer for (IntegerStarVectorIterator iter = nums.begin(); iter != nums.end(); ++iter) { std::cout << "TEST2: uniq = " << (*iter)->getNum() << "\n"; } // remove the unique objects from heap for (IntegerStarVectorIterator iter = nums.begin(); iter != nums.end(); ++iter) { delete (*iter); } // shrink the vector nums.erase(nums.begin(), nums.end()); // the vector should now be completely empty assert( nums.size() == 0); } //@ print to stdout the string: "info_msg: num1, num2, .... numN\n" void printIntegerStarVector( const std::string& msg, const std::vector<Integer*>& nums ) { std::cout << msg << ": "; int inx = 0; ConstIntegerStarVectorIterator iter; // use const iterator and const range! // NB: cbegin() and cend() not supported until LATER (c++11) for (iter = nums.begin(), inx = 0; iter != nums.end(); ++iter, ++inx) { // output a comma seperator *AFTER* first if (inx > 0) std::cout << ", "; // call Integer::toString() std::cout << (*iter)->getNum(); // send int to stdout // std::cout << (*iter)->toString(); // also works, but is probably slower } // in conclusion, add newline std::cout << "\n"; } 

As already stated, unique requires a sorted container. Additionally, unique doesn't actually remove elements from the container. Instead, they are copied to the end, unique returns an iterator pointing to the first such duplicate element, and you are expected to call erase to actually remove the elements.

Here's the example of the duplicate delete problem that occurs with std::unique(). On a LINUX machine, the program crashes. Read the comments for details.

 // Main10.cpp // // Illustration of duplicate delete and memory leak in a vector<int*> after calling std::unique. // On a LINUX machine, it crashes the progam because of the duplicate delete. // // INPUT : {1, 2, 2, 3} // OUTPUT: {1, 2, 3, 3} // // The two 3's are actually pointers to the same 3 integer in the HEAP, which is BAD // because if you delete both int* pointers, you are deleting the same memory // location twice. // // // Never mind the fact that we ignore the "dupPosition" returned by std::unique(), // but in any sensible program that "cleans up after istelf" you want to call deletex // on all int* poitners to avoid memory leaks. // // // NOW IF you replace std::unique() with ptgi::unique(), all of the the problems disappear. // Why? Because ptgi:unique merely reshuffles the data: // OUTPUT: {1, 2, 3, 2} // The ptgi:unique has swapped the last two elements, so all of the original elements in // the INPUT are STILL in the OUTPUT. // // 130215 dbednar@ptgi.com //============================================================================ #include <iostream> #include <vector> #include <algorithm> #include <functional> #include "ptgi_unique.hpp" // functor used by std::unique to remove adjacent elts from vector<int*> struct EqualToVectorOfIntegerStar: public std::equal_to<int *> { bool operator() (const int* arg1, const int* arg2) const { return (*arg1 == *arg2); } }; void printVector( const std::string& msg, const std::vector<int*>& vnums); int main() { int inums [] = { 1, 2, 2, 3 }; std::vector<int*> vnums; // convert C array into vector of pointers to integers for (size_t inx = 0; inx < 4; ++ inx) vnums.push_back( new int(inums[inx]) ); printVector("BEFORE UNIQ", vnums); // INPUT : 1, 2A, 2B, 3 std::unique( vnums.begin(), vnums.end(), EqualToVectorOfIntegerStar() ); // OUTPUT: 1, 2A, 3, 3 } printVector("AFTER UNIQ", vnums); // now we delete 3 twice, and we have a memory leak because 2B is not deleted. for (size_t inx = 0; inx < vnums.size(); ++inx) { delete(vnums[inx]); } } // print a line of the form "msg: 1,2,3,..,5,6,7\n", where 1..7 are the numbers in vnums vector // PS: you may pass "hello world" (const char *) because of implicit (automatic) conversion // from "const char *" to std::string conversion. void printVector( const std::string& msg, const std::vector<int*>& vnums) { std::cout << msg << ": "; for (size_t inx = 0; inx < vnums.size(); ++inx) { // insert comma separator before current elt, but ONLY after first elt if (inx > 0) std::cout << ","; std::cout << *vnums[inx]; } std::cout << "\n"; } 

About alexK7 benchmarks. I tried them and got similar results, but when the range of values is 1 million the cases using std::sort (f1) and using std::unordered_set (f5) produce similar time. When the range of values is 10 million f1 is faster than f5.

If the range of values is limited and the values are unsigned int, it is possible to use std::vector, the size of which corresponds to the given range. 这里是代码:

 void DeleteDuplicates_vector_bool(std::vector<unsigned>& v, unsigned range_size) { std::vector<bool> v1(range_size); for (auto& x: v) { v1[x] = true; } v.clear(); unsigned count = 0; for (auto& x: v1) { if (x) { v.push_back(count); } ++count; } } 
 std::set<int> s; std::for_each(v.cbegin(), v.cend(), [&s](int val){s.insert(val);}); v.clear(); std::copy(s.cbegin(), s.cend(), v.cbegin()); 

sort(v.begin(), v.end()), v.erase(unique(v.begin(), v,end()), v.end());