如何统计Ruby数组中的重复元素

我有一个sorting的数组:

[ 'FATAL <error title="Request timed out.">', 'FATAL <error title="Request timed out.">', 'FATAL <error title="There is insufficient system memory to run this query.">' ] 

我想得到这样的东西,但它不一定是一个哈希:

 [ {:error => 'FATAL <error title="Request timed out.">', :count => 2}, {:error => 'FATAL <error title="There is insufficient system memory to run this query.">', :count => 1} ] 

下面的代码打印你所要求的。 我会让你决定如何实际使用生成你正在寻找的哈希:

 # sample array a=["aa","bb","cc","bb","bb","cc"] # make the hash default to 0 so that += will work correctly b = Hash.new(0) # iterate over the array, counting duplicate entries a.each do |v| b[v] += 1 end b.each do |k, v| puts "#{k} appears #{v} times" end 

注:我只是注意到你说数组已经sorting。 上面的代码不需要sorting。 使用该属性可能会产生更快的代码。

你可以通过使用inject非常简洁(一行):

 a = ['FATAL <error title="Request timed out.">', 'FATAL <error title="Request timed out.">', 'FATAL <error title="There is insufficient ...">'] b = a.inject(Hash.new(0)) {|h,i| h[i] += 1; h } b.to_a.each {|error,count| puts "#{count}: #{error}" } 

会产生:

 1: FATAL <error title="There is insufficient ..."> 2: FATAL <error title="Request timed out."> 

如果你有这样的数组:

 words = ["aa","bb","cc","bb","bb","cc"] 

您需要统计重复的元素,一行解决scheme是:

 result = words.each_with_object(Hash.new(0)) { |word,counts| counts[word] += 1 } 

我个人会这样做:

 # myprogram.rb a = ['FATAL <error title="Request timed out.">', 'FATAL <error title="Request timed out.">', 'FATAL <error title="There is insufficient system memory to run this query.">'] puts a 

然后运行该程序并将其传递给uniq -c:

 ruby myprogram.rb | uniq -c 

输出:

  2 FATAL <error title="Request timed out."> 1 FATAL <error title="There is insufficient system memory to run this query."> 

使用Enumerable#group_by来解决上述问题。

 [1, 2, 2, 3, 3, 3, 4].group_by(&:itself).map { |k,v| [k, v.count] }.to_h # {1=>1, 2=>2, 3=>3, 4=>1} 

将其分解成不同的方法调用:

 a = [1, 2, 2, 3, 3, 3, 4] a = a.group_by(&:itself) # {1=>[1], 2=>[2, 2], 3=>[3, 3, 3], 4=>[4]} a = a.map { |k,v| [k, v.count] } # [[1, 1], [2, 2], [3, 3], [4, 1]] a = a.to_h # {1=>1, 2=>2, 3=>3, 4=>1} 

在Ruby 1.8.7中添加了Enumerable#group_by #group_by。

 a = [1,1,1,2,2,3] a.uniq.inject([]){|r, i| r << { :error => i, :count => a.select{ |b| b == i }.size } } => [{:count=>3, :error=>1}, {:count=>2, :error=>2}, {:count=>1, :error=>3}] 

以下情况如何:

 things = [1, 2, 2, 3, 3, 3, 4] things.uniq.map{|t| [t,things.count(t)]}.to_h 

这种感觉更清晰,更具描述性,我们正在尝试做什么。

我怀疑它对大集合也会比对每个值迭代的集合更好。

基准性能testing:

 a = (1...1000000).map { rand(100)} user system total real inject 7.670000 0.010000 7.680000 ( 7.985289) array count 0.040000 0.000000 0.040000 ( 0.036650) each_with_object 0.210000 0.000000 0.210000 ( 0.214731) group_by 0.220000 0.000000 0.220000 ( 0.218581) 

所以它比较快

简单的实现:

 (errors_hash = {}).default = 0 array_of_errors.each { |error| errors_hash[error] += 1 } 

这里是示例数组:

 a=["aa","bb","cc","bb","bb","cc"] 
  1. select所有的唯一键。
  2. 对于每个键,我们将它们累积成一个哈希来得到这样的东西: {'bb' => ['bb', 'bb']}
     res = a.uniq.inject({}){| accu,uni |  accu.merge({uni => a.select {| i | i == uni}})}
     {“aa”=> [“aa”],“bb”=> [“bb”,“bb”,“bb”],“cc”=> [“cc”,“cc”]}

现在你可以做这样的事情了:

 res['aa'].size