Java – 删除ArrayList中的重复项

我正在使用ArrayList来存储Strings 。该程序提示用户一个菜单，并允许用户select一个操作来执行。这样的操作是将string添加到列表中，打印条目等。我想要做的是创build一个名为removeDuplicates()的方法。这个方法将searchArrayList并删除任何重复的值。我想在列表中留下一个重复值的实例。我也希望这个方法返回删除重复的总数。

我一直在尝试使用嵌套循环来实现这一点，但我一直在遇到麻烦，因为当条目被删除时， ArrayList的索引被改变，事情不能正常工作。我从概念上知道我需要做什么，但是在代码中实现这个想法时遇到了麻烦。

这是一些伪代码：

从第一个入口开始; 检查列表中的每个后续条目，看它是否与第一个条目匹配; 删除列表中与第一个条目匹配的每个后续条目;

毕竟所有参赛作品已经过检查，转到第二项; 检查列表中的每个条目，看它是否与第二个条目匹配; 删除列表中与第二个条目匹配的每个条目;

重复列表中的条目

这是我迄今为止的代码：

 public int removeDuplicates() { int duplicates = 0; for ( int i = 0; i < strings.size(); i++ ) { for ( int j = 0; j < strings.size(); j++ ) { if ( i == j ) { // i & j refer to same entry so do nothing } else if ( strings.get( j ).equals( strings.get( i ) ) ) { strings.remove( j ); duplicates++; } } } return duplicates; }

更新：看来，威尔正在寻找一个家庭作业的解决scheme，涉及开发algorithm来删除重复，而不是一个实用的解决scheme使用集。看他的评论：

Thx的build议。这是一个任务的一部分，我相信老师有意为解决scheme不包括集合。换句话说，我将提出一个解决scheme，在不执行HashSet情况下search并删除重复项。老师build议使用嵌套循环，这是我想要做的，但我已经有一些问题，在删除某些条目后的ArrayList的索引。

为什么不使用像Set这样的Set （以及像HashSet这样的实现）来自然地防止重复？

你可以使用嵌套循环没有任何问题：

 public static int removeDuplicates(ArrayList<String> strings) { int size = strings.size(); int duplicates = 0; // not using a method in the check also speeds up the execution // also i must be less that size-1 so that j doesn't // throw IndexOutOfBoundsException for (int i = 0; i < size - 1; i++) { // start from the next item after strings[i] // since the ones before are checked for (int j = i + 1; j < size; j++) { // no need for if ( i == j ) here if (!strings.get(j).equals(strings.get(i))) continue; duplicates++; strings.remove(j); // decrease j because the array got re-indexed j--; // decrease the size of the array size--; } // for j } // for i return duplicates; }

你可以试试这个class轮来拿一份String保存命令的副本。

 List<String> list; List<String> dedupped = new ArrayList<String>(new LinkedHashSet<String>(list));

这种方法也是O（n）摊销而不是O（n ^ 2）

只是为了澄清我对马特b的答案的评论，如果你真的想要计算删除重复的数量，使用此代码：

 List<String> list = new ArrayList<String>(); // list gets populated from user input... Set<String> set = new HashSet<String>(list); int numDuplicates = list.size() - set.size();

 List<String> lst = new ArrayList<String>(); lst.add("one"); lst.add("one"); lst.add("two"); lst.add("three"); lst.add("three"); lst.add("three"); Set se =new HashSet(lst); lst.clear(); lst = new ArrayList<String>(se); for (Object ls : lst){ System.out.println("Resulting output---------" + ls); }

我一直在尝试使用嵌套循环来实现这一点，但我一直在遇到麻烦，因为当条目被删除时 ，ArrayList的索引被修改 ，事情不能正常工作

为什么每次删除条目时都不要减less计数器。

当你删除一个条目时，元素也会移动：

EJ：

 String [] a = {"a","a","b","c" }

位置：

 a[0] = "a"; a[1] = "a"; a[2] = "b"; a[3] = "c";

删除第一个“a”后，索引是：

 a[0] = "a"; a[1] = "b"; a[2] = "c";

所以，你应该考虑到这一点，并减lessj （ j-- ）的值，以避免“跳跃”超过一个值。

看这个截图：

它的工作

 public Collection removeDuplicates(Collection c) { // Returns a new collection with duplicates removed from passed collection. Collection result = new ArrayList(); for(Object o : c) { if (!result.contains(o)) { result.add(o); } } return result; }

要么

 public void removeDuplicates(List l) { // Removes duplicates in place from an existing list Object last = null; Collections.sort(l); Iterator i = l.iterator(); while(i.hasNext()) { Object o = i.next(); if (o.equals(last)) { i.remove(); } else { last = o; } } }

两者都未经testing。

从araylist中删除重复string的一个非常简单的方法

 ArrayList al = new ArrayList(); // add elements to al, including duplicates HashSet hs = new HashSet(); hs.addAll(al); al.clear(); al.addAll(hs);

假设你不能像你说的那样使用一个Set，解决这个问题最简单的方法就是使用一个临时列表，而不是尝试删除重复的地方：

 public class Duplicates { public static void main(String[] args) { List<String> list = new ArrayList<String>(); list.add("one"); list.add("one"); list.add("two"); list.add("three"); list.add("three"); list.add("three"); System.out.println("Prior to removal: " +list); System.out.println("There were " + removeDuplicates(list) + " duplicates."); System.out.println("After removal: " + list); } public static int removeDuplicates(List<String> list) { int removed = 0; List<String> temp = new ArrayList<String>(); for(String s : list) { if(!temp.contains(s)) { temp.add(s); } else { //if the string is already in the list, then ignore it and increment the removed counter removed++; } } //put the contents of temp back in the main list list.clear(); list.addAll(temp); return removed; } }

使用一组是删除重复的最佳select：

如果你有一个数组列表，你可以删除重复的数据并保留数组列表的特性：

  List<String> strings = new ArrayList<String>(); //populate the array ... List<String> dedupped = new ArrayList<String>(new HashSet<String>(strings)); int numdups = strings.size() - dedupped.size();

如果你不能使用一个集合，对数组（Collections.sort（））进行sorting并遍历列表，检查当前元素是否等于前一个元素，如果是，则删除它。

使用一套是最好的select（如其他人所build议的）。

如果你想比较列表中的所有元素与eachother你应该稍微适应你的循环：

 for(int i = 0; i < max; i++) for(int j = i+1; j < max; j++)

这样你不只比较每个元素一次而不是两次。这是因为与第一个循环相比，第二个循环从下一个元素开始。

另外，当迭代它们时（即使使用for循环而不是迭代器）从列表中删除时，请记住，您可以减小列表的大小。一个常见的解决scheme是保留另一个要删除的项目列表，然后在决定删除哪个项目之后，将其从原始列表中删除。

 public ArrayList removeDuplicates(ArrayList <String> inArray) { ArrayList <String> outArray = new ArrayList(); boolean doAdd = true; for (int i = 0; i < inArray.size(); i++) { String testString = inArray.get(i); for (int j = 0; j < inArray.size(); j++) { if (i == j) { break; } else if (inArray.get(j).equals(testString)) { doAdd = false; break; } } if (doAdd) { outArray.add(testString); } else { doAdd = true; } } return outArray; }

你可以用一个空string*replace重复，从而保持索引的机智。然后，你完成后，你可以去掉空的string。

*但是，只有在你的实现中一个空string是无效的。

您在代码中看到的问题是您在迭代过程中删除了一个条目，从而使迭代位置失效。

例如：

 {"a", "b", "c", "b", "b", "d"} ij

现在你正在删除string[j]。

 {"a", "b", "c", "b", "d"} ij

内循环结束，j递增。

 {"a", "b", "c", "b", "d"} ij

只有一个重复的“b”检测到…哎呀。

在这些情况下，最佳做法是存储必须移除的位置，并在完成对数组列表的迭代后删除它们。（一个奖励，strings.size（）调用可以由你或编译器在循环之外进行优化）

提示，你可以开始在i + 1迭代j，你已经检查了0 – i！

内循环无效。如果你删除一个元素，你不能增加j ，因为j现在指向你删除的元素之后的元素，你需要检查它。

换句话说，你应该使用while循环而不是for循环，并且只有当i和j的元素不匹配时才增加j 。如果它们匹配，则删除j处的元素。 size()会减1， j现在指向下面的元素，所以不需要增加j 。

另外，没有理由检查内部循环中的所有元素，只是在i的那些元素，因为在之前的迭代中i已经被删除之前的重复。

 public <Foo> Entry<Integer,List<Foo>> uniqueElementList(List<Foo> listWithPossibleDuplicates) { List<Foo> result = new ArrayList<Foo>();//...might want to pre-size here, if you have reliable info about the number of dupes Set<Foo> found = new HashSet<Foo>(); //...again with the pre-sizing for (Foo f : listWithPossibleDuplicates) if (found.add(f)) result.add(f); return entryFactory(listWithPossibleDuplicates.size()-found.size(), result); }

然后有一些entryFactory(Integer key, List<Foo> value)方法。如果你想改变原来的列表（可能不是一个好主意，但是不pipe）：

 public <Foo> int removeDuplicates(List<Foo> listWithPossibleDuplicates) { int original = listWithPossibleDuplicates.size(); Iterator<Foo> iter = listWithPossibleDuplicates.iterator(); Set<Foo> found = new HashSet<Foo>(); while (iter.hasNext()) if (!found.add(iter.next())) iter.remove(); return original - found.size(); }

对于使用string的特定情况，您可能需要处理一些附加的等式约束（例如，大写和小写的版本是相同的还是不同的？）。

编辑：啊，这是作业。查看Java Collections框架中的Iterator / Iterable以及Set，看看你是否得出我提供的相同结论。 generics部分只是肉汁。

join这个问题我迟了一点，但是对于使用GENERICtypes的问题，我提供了一个更好的解决scheme。以上提供的所有解决scheme只是一个解决scheme。它们正在增加整个运行时线程的复杂性。

RemoveDuplicacy.java

在加载时间，我们可以使用一种应该做的要求的技术来最小化它。

例如：假设当您使用类types的数组列表时：

 ArrayList<User> usersList = new ArrayList<User>(); usersList.clear(); User user = new User(); user.setName("A"); user.setId("1"); // duplicate usersList.add(user); user = new User(); user.setName("A"); user.setId("1"); // duplicate usersList.add(user); user = new User(); user.setName("AB"); user.setId("2"); // duplicate usersList.add(user); user = new User(); user.setName("C"); user.setId("4"); usersList.add(user); user = new User(); user.setName("A"); user.setId("1"); // duplicate usersList.add(user); user = new User(); user.setName("A"); user.setId("2"); // duplicate usersList.add(user); }

用于上面使用的数组列表的基类：用户类

 class User { private String name; private String id; /** * @param name * the name to set */ public void setName(String name) { this.name = name; } /** * @return the name */ public String getName() { return name; } /** * @param id * the id to set */ public void setId(String id) { this.id = id; } /** * @return the id */ public String getId() { return id; }

}

现在在java中，Object（parent）Class有两个Overrided方法，可以帮助我们更好的服务于我们的目的。它们是：

 @Override public int hashCode() { final int prime = 31; int result = 1; result = prime * result + ((id == null) ? 0 : id.hashCode()); return result; } @Override public boolean equals(Object obj) { if (this == obj) return true; if (obj == null) return false; if (getClass() != obj.getClass()) return false; User other = (User) obj; if (id == null) { if (other.id != null) return false; } else if (!id.equals(other.id)) return false; return true; }

您必须在用户类中重写这些方法

这里是完整的代码：

https://gist.github.com/4584310

如果您有任何疑问，请告诉我。

您可以将列表添加到HashSet中，然后再次将该哈希集合转换为列表以删除重复项。

 public static int removeDuplicates(List<String> duplicateList){ List<String> correctedList = new ArrayList<String>(); Set<String> a = new HashSet<String>(); a.addAll(duplicateList); correctedList.addAll(a); return (duplicateList.size()-correctedList.size()); }

这里它会返回重复的数量。您也可以使用具有所有唯一值的correctList

下面是从列表中删除重复的元素，而不改变列表的顺序，没有使用临时列表，也没有使用任何设置variables的代码。此代码保存内存并提高性能。

这是一种通用的方法，适用于任何types的列表。

这是在采访中提到的问题。在许多论坛search的解决scheme，但无法find一个，所以认为这是张贴代码的正确论坛。

  public List<?> removeDuplicate(List<?> listWithDuplicates) { int[] intArray = new int[listWithDuplicates.size()]; int dupCount = 1; int arrayIndex = 0; int prevListIndex = 0; // to save previous listIndex value from intArray int listIndex; for (int i = 0; i < listWithDuplicates.size(); i++) { for (int j = i + 1; j < listWithDuplicates.size(); j++) { if (listWithDuplicates.get(j).equals(listWithDuplicates.get(i))) dupCount++; if (dupCount == 2) { intArray[arrayIndex] = j; // Saving duplicate indexes to an array arrayIndex++; dupCount = 1; } } } Arrays.sort(intArray); for (int k = intArray.length - 1; k >= 0; k--) { listIndex = intArray[k]; if (listIndex != 0 && prevListIndex != listIndex){ listWithDuplicates.remove(listIndex); prevListIndex = listIndex; } } return listWithDuplicates; }

你可以做这样的事情，以上人们所回答的是一种select，但这是另一种select。

 for (int i = 0; i < strings.size(); i++) { for (int j = j + 1; j > strings.size(); j++) { if(strings.get(i) == strings.get(j)) { strings.remove(j); j--; }` } } return strings;

  List<String> list = new ArrayList<>(); list.add("foo"); list.add("foo"); list.add("bar"); list.add("foo"); list.add("bar"); int index = 0; int count = 0; while (index < list.size() - 1) { String item = list.get(index); List<String> tail = list.subList(index + 1, list.size()); while (tail.contains(item)) { tail.remove(item); count++; } index++; } System.out.println(count); System.out.println(list);

Java – 删除ArrayList中的重复项

并发线程同时添加到ArrayList – 会发生什么？

ArrayList vs List <> in C＃

我应该使用什么Java集合？

数组的Arrays.asList（）

如何从ArrayList或String数组中删除所有的null元素？

如何sortingArrayList？

在ArrayList中添加foreach循环时出现ConcurrentModificationException

Java中的可变长度（dynamic）数组

为什么Java的ArrayList的remove函数似乎花费如此之less？

ArrayList的初始化等同于数组的初始化