你会使用哪个数据结构：TreeMap或者HashMap？（JAVA）

说明| 一个Java程序，用于读取文本文件，并按字母顺序打印每个唯一字以及该字在文本中出现的次数。

程序应该声明一个Map<String, Integer>types的variables来存储单词和相应的出现频率。哪个具体types呢？ TreeMap<String, Number>或HashMap<String, Number> ？

input应该被转换为小写。

一个单词不包含任何这些字符： \t\t\n]f.,!?:;\"()'

示例输出|

  Word Frequency a 1 and 5 appearances 1 as 1 . . .

备注| 我知道，我已经看到了用Perl几乎两行代码的优雅的解决scheme。不过，我想看看它在Java中。

编辑：噢，这是有帮助的显示使用这些结构之一（在Java中）的实现。

TreeMap对我来说似乎是一件简单的事情 – 仅仅是因为“按字母顺序”的要求。 HashMap在迭代时没有sorting; TreeMap以自然键顺序迭代。

编辑：我想Konrad的评论可能一直在暗示“使用HashMap，然后sorting”。这很好，因为尽pipe我们最初有N次迭代，但是由于重复，最后会有K <= N个键。我们不妨把这个昂贵的位（sorting）保存到最后，当我们有更less的关键字，而不是像我们走的时候那样保持sorting的小而非不变的命中。

话虽如此，我暂时坚持我的回答：因为这是实现目标的最简单的方法。我们并不是真的知道OP对演出特别担心，但这个问题暗示着他对优雅和简洁的关注。使用TreeMap使这个令人难以置信的简短，这吸引我。我怀疑，如果性能确实是一个问题，可能有更好的方法来攻击它比TreeMap或HashMap 🙂

因为TreeMap已经为你sorting，所以TreeMap跳动HashMap。

但是，您可能要考虑使用更合适的数据结构，一个包。请参阅Commons Collections – 和TreeBag类：

这有一个很好的内部结构和API：

 bag.add("big") bag.add("small") bag.add("big") int count = bag.getCount("big")

编辑：HashMap vs TreeMap性能的问题由Jon – HashMap回答，sorting可能会更快（尝试！），但TreeBag更容易。袋子也是如此。有一个HashBag以及一个TreeBag。基于实现（使用可变整数），一个包应该优于Integer的等效平面图。唯一可以肯定的方法就是testing，就像任何性能问题一样。

我看到不less人说“TreeMap查找需要O(n log n) ”!! 怎么来的？

我不知道它是如何实现的，但在我的脑海中，它需要O(log n) 。

这是因为在树中查找可以在O(log n) 。每次插入项目时，都不要对整个树进行sorting。这是使用树的整个想法！

因此，回到原来的问题，比较的数字是：

HashMap方法： O(n + k log k)平均情况下，最差情况可能会大得多

TreeMap方法： O(k + n log k)最坏的情况

其中n =文本中单词的数量，k =文本中不同单词的数量。

哈希映射应该快得多。您不应该根据您希望物品最终如何排列来select一个容器; 只要在最后列出（单词，频率）对。通常比文件中的单词lesssorting，因此散列图的渐近（和真实）性能会更好。

您不能将一个TreeMap<String,Number>赋值给一个types为Map<String,Integer>的variables。 Double ， Long等可以“放入”到TreeMap<String,Number> 。当我从Map<String,Integer>获得一个值时，它必须是一个Integer 。

完全忽略了任何i18n问题，内存限制和error handling，这里是：

 class Counter { public static void main(String... argv) throws Exception { FileChannel fc = new FileInputStream(argv[0]).getChannel(); ByteBuffer bb = fc.map(FileChannel.MapMode.READ_ONLY, 0, fc.size()); CharBuffer cb = Charset.defaultCharset().decode(bb); Pattern p = Pattern.compile("[^ \t\r\n\f.,!?:;\"()']+"); Map<String, Integer> counts = new TreeMap<String, Integer>(); Matcher m = p.matcher(cb); while (m.find()) { String word = m.group(); Integer count = counts.get(word); count = (count == null) ? 1 : count + 1; counts.put(word, count); } fc.close(); for (Map.Entry<String, Integer> e : counts.entrySet()) { System.out.printf("%s: %d%n", e.getKey(), e.getValue()); } } }

“当一个密钥已经存在时，它和HashMap具有相同的性能。” – 这显然是错误的。 HashMap具有O（1）插入和TreeMap O（n log n）。至less需要n次检查才能知道是否在桌面上！

 import java.io.BufferedReader; import java.io.DataInputStream; import java.io.FileInputStream; import java.io.FileNotFoundException; import java.io.IOException; import java.io.InputStreamReader; import java.io.ObjectInputStream.GetField; import java.util.Iterator; import java.util.Map; import java.util.StringTokenizer; import java.util.TreeMap; public class TreeMapExample { public static void main (String args[]){ Map<String,Integer> tm = new TreeMap<String,Integer>(); try { FileInputStream fis = new FileInputStream("Test.txt"); DataInputStream in = new DataInputStream(fis); BufferedReader br = new BufferedReader(new InputStreamReader(in)); String line; int countValue = 1; while((line = br.readLine())!= null ){ line = line.replaceAll("[-+.^:;,()\"\\[\\]]",""); StringTokenizer st = new StringTokenizer(line, " "); while(st.hasMoreTokens()){ String nextElement = (String) st.nextElement(); if(tm.size()>0 && tm.containsKey(nextElement)){ int val = 0; if(tm.get(nextElement)!= null){ val = (Integer) tm.get(nextElement); val = val+1; } tm.put(nextElement, val); }else{ tm.put(nextElement, 1); } } } for(Map.Entry<String,Integer> entry : tm.entrySet()) { System.out.println(entry.getKey() + " : " + entry.getValue()); } } catch (FileNotFoundException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } } }

这样，在我看来，更好地使用来自Apache Commons Collections的 HashBag或Guava的 HashMultiset或Eclipse Collections （forms上的GS Collections ）或任何下面的类来使用HashBag：

  Order | Guava | Apache | Eclipse(GS) | JDK analog ─────────────┼──────────────────┼───────────┼─────────────┼───────────── Not define | HashMultiset | HashBag | HashBag | HashMap<String, Integer> ─────────────┼──────────────────┼───────────┼─────────────┼───────────── Sorted | TreeMultiset | TreeBag | TreeBag | TreeMap<String, Integer> ─────────────┼──────────────────┼───────────┼─────────────┼───────────── Linked |LinkedHashMultiset| - | - | LinkedHashMap<String, Integere> ─────────────┼──────────────────┼───────────┼─────────────┼───────────── Concurrent & | ConcurrentHash- |Synchroniz-|Synchroniz- | Collections.synchronizedMap( not define | Multiset | edBag | edBag | HashMap<String, Integer>) ─────────────┼──────────────────┼───────────┼─────────────┼───────────── Concurrent | - |Synchroniz-|Synchroniz- | Collections.synchronizedSorted- and sorted | |edSortedBag| edSortedBag | Map(TreeMap<>)) ─────────────┼──────────────────┼───────────┼─────────────┼───────────── Immutable and| ImmutableMultiset|Unmodifiab-|Unmodifiab- | Collections.unmodifiableMap( not define | | leBag | leBag | HashMap<String, Integer>) ─────────────┼──────────────────┼───────────┼─────────────┼───────────── Immutable and| ImmutableSorted- |Unmodifiab-|Unmodifiab- | Collections.unmodifiableSorted- sorted | Multiset |leSortedBag| leSortedBag | Map(TreeMap<String, Integer>)) ────────────────────────────────────────────────────────────────────────

例子：

1. 使用Apache的SynchronizedSortedBag ：

  // Parse text to separate words String INPUT_TEXT = "Hello World! Hello All! Hi World!"; // Create Multiset Bag bag = SynchronizedSortedBag.synchronizedBag(new TreeBag(Arrays.asList(INPUT_TEXT.split(" ")))); // Print count words System.out.println(bag); // print [1:All!,2:Hello,1:Hi,2:World!]- in natural (alphabet) order // Print all unique words System.out.println(bag.uniqueSet()); // print [All!, Hello, Hi, World!]- in natural (alphabet) order // Print count occurrences of words System.out.println("Hello = " + bag.getCount("Hello")); // print 2 System.out.println("World = " + bag.getCount("World!")); // print 2 System.out.println("All = " + bag.getCount("All!")); // print 1 System.out.println("Hi = " + bag.getCount("Hi")); // print 1 System.out.println("Empty = " + bag.getCount("Empty")); // print 0 // Print count all words System.out.println(bag.size()); //print 6 // Print count unique words System.out.println(bag.uniqueSet().size()); //print 4

2.从Eclipse（GC）使用TreeBag ：

  // Parse text to separate words String INPUT_TEXT = "Hello World! Hello All! Hi World!"; // Create Multiset MutableSortedBag<String> bag = TreeBag.newBag(Arrays.asList(INPUT_TEXT.split(" "))); // Print count words System.out.println(bag); // print [All!, Hello, Hello, Hi, World!, World!]- in natural order // Print all unique words System.out.println(bag.toSortedSet()); // print [All!, Hello, Hi, World!]- in natural order // Print count occurrences of words System.out.println("Hello = " + bag.occurrencesOf("Hello")); // print 2 System.out.println("World = " + bag.occurrencesOf("World!")); // print 2 System.out.println("All = " + bag.occurrencesOf("All!")); // print 1 System.out.println("Hi = " + bag.occurrencesOf("Hi")); // print 1 System.out.println("Empty = " + bag.occurrencesOf("Empty")); // print 0 // Print count all words System.out.println(bag.size()); //print 6 // Print count unique words System.out.println(bag.toSet().size()); //print 4

3. 使用Guava的LinkedHashMultiset ：

  // Parse text to separate words String INPUT_TEXT = "Hello World! Hello All! Hi World!"; // Create Multiset Multiset<String> multiset = LinkedHashMultiset.create(Arrays.asList(INPUT_TEXT.split(" "))); // Print count words System.out.println(multiset); // print [Hello x 2, World! x 2, All!, Hi]- in predictable iteration order // Print all unique words System.out.println(multiset.elementSet()); // print [Hello, World!, All!, Hi] - in predictable iteration order // Print count occurrences of words System.out.println("Hello = " + multiset.count("Hello")); // print 2 System.out.println("World = " + multiset.count("World!")); // print 2 System.out.println("All = " + multiset.count("All!")); // print 1 System.out.println("Hi = " + multiset.count("Hi")); // print 1 System.out.println("Empty = " + multiset.count("Empty")); // print 0 // Print count all words System.out.println(multiset.size()); //print 6 // Print count unique words System.out.println(multiset.elementSet().size()); //print 4

你会使用哪个数据结构：TreeMap或者HashMap？（JAVA）

1. 使用Apache的SynchronizedSortedBag ：

2.从Eclipse（GC）使用TreeBag ：

3. 使用Guava的LinkedHashMultiset ：

更多的例子，你可以在我的github项目中find

Visual C ++相当于GCC的attribute（（packed））

为什么在链表中find循环的时候为什么不用3,4,5？

匹配string数组中的值

什么时候使用Preorder，Postorder和Inorder二叉search树遍历策略

用于实现字典的最佳数据结构？

设置数据结构的JavaScript实现

用LINQ将列表拆分成子列表

在IGrouping中获取“Value”属性

TMP：如何推广vector的笛卡尔乘积？

Java中contains（）的最快数据结构？

你会使用哪个数据结构：TreeMap或者HashMap？ （JAVA）

1. 使用Apache的SynchronizedSortedBag ：

2.从Eclipse（GC）使用TreeBag ：

3. 使用Guava的LinkedHashMultiset ：

更多的例子，你可以在我的github项目中find

Visual C ++相当于GCC的__attribute__（（__packed__））

为什么在链表中find循环的时候为什么不用3,4,5？

匹配string数组中的值

什么时候使用Preorder，Postorder和Inorder二叉search树遍历策略

用于实现字典的最佳数据结构？

设置数据结构的JavaScript实现

用LINQ将列表拆分成子列表

在IGrouping中获取“Value”属性

TMP：如何推广vector的笛卡尔乘积？

Java中contains（）的最快数据结构？

你会使用哪个数据结构：TreeMap或者HashMap？（JAVA）

Visual C ++相当于GCC的attribute（（packed））