获取直方图的数据

有没有一种方法可以在MySQL中指定bin大小？现在，我正在尝试以下SQL查询：

select total, count(total) from faults GROUP BY total;

正在生成的数据足够好，但行数太多。我需要的是将数据分组到预定义的箱中的方法。我可以从脚本语言中做到这一点，但有没有办法直接在SQL中做到这一点？

例：

 +-------+--------------+ | total | count(total) | +-------+--------------+ | 30 | 1 | | 31 | 2 | | 33 | 1 | | 34 | 3 | | 35 | 2 | | 36 | 6 | | 37 | 3 | | 38 | 2 | | 41 | 1 | | 42 | 5 | | 43 | 1 | | 44 | 7 | | 45 | 4 | | 46 | 3 | | 47 | 2 | | 49 | 3 | | 50 | 2 | | 51 | 3 | | 52 | 4 | | 53 | 2 | | 54 | 1 | | 55 | 3 | | 56 | 4 | | 57 | 4 | | 58 | 2 | | 59 | 2 | | 60 | 4 | | 61 | 1 | | 63 | 2 | | 64 | 5 | | 65 | 2 | | 66 | 3 | | 67 | 5 | | 68 | 5 | ------------------------

我在找什么：

 +------------+---------------+ | total | count(total) | +------------+---------------+ | 30 - 40 | 23 | | 40 - 50 | 15 | | 50 - 60 | 51 | | 60 - 70 | 45 | ------------------------------

我想这不能直接实现，但是对任何相关的存储过程的引用也可以。

这是一个关于在MySQL中为数值创build直方图的超级快捷方式的文章。

使用CASE语句和其他types的复杂逻辑，还有其他多种方法可以创build更好更灵活的柱状图。这个方法赢得了我一次又一次，因为它是很容易修改每个用例，如此简短。这是你如何做到的：
 SELECT ROUND(numeric_value, -2) AS bucket, COUNT(*) AS COUNT, RPAD('', LN(COUNT(*)), '*') AS bar FROM my_table GROUP BY bucket; 
只要将numeric_value更改为任何列，更改舍入增量，就是这样。我已经把酒吧做成了对数规模，所以当你有很大的价值时，他们不会变得太多。

numeric_value应该在ROUNDing操作中根据舍入增量进行偏移，以确保第一个桶包含与以下桶一样多的元素。

例如ROUND（numeric_value，-1），范围[0,4]（5个元素）中的numeric_value将被放置在第一个桶中，而[5,14]（10个元素）在第二个中，[15,24]除非numeric_value通过ROUND（numeric_value – 5，-1）正确偏移。

这是对一些看起来很漂亮的随机数据进行查询的一个例子。足够好的数据快速评估。
 +--------+----------+-----------------+ | bucket | count | bar | +--------+----------+-----------------+ | -500 | 1 | | | -400 | 2 | * | | -300 | 2 | * | | -200 | 9 | ** | | -100 | 52 | **** | | 0 | 5310766 | *************** | | 100 | 20779 | ********** | | 200 | 1865 | ******** | | 300 | 527 | ****** | | 400 | 170 | ***** | | 500 | 79 | **** | | 600 | 63 | **** | | 700 | 35 | **** | | 800 | 14 | *** | | 900 | 15 | *** | | 1000 | 6 | ** | | 1100 | 7 | ** | | 1200 | 8 | ** | | 1300 | 5 | ** | | 1400 | 2 | * | | 1500 | 4 | * | +--------+----------+-----------------+ 
一些注意事项：不匹配的范围不会出现在计数中 – 计数列中不会有零。另外，我在这里使用ROUND函数。如果您觉得这对您更有意义，您可以轻松地用TRUNCATEreplace它。

我在这里findhttp://blog.shlomoid.com/2011/08/how-to-quickly-create-histogram-in.html

麦克·德尔高迪奥的答案就是我这样做的方式，但稍作改动：

 select floor(mycol/10)*10 as bin_floor, count(*) from mytable group by 1 order by 1

优势？你可以把箱子做得很大或者很小。大小100的箱子？ floor(mycol/100)*100 。大小5的垃圾箱？ floor(mycol/5)*5 。

伯纳多。

 SELECT b.*,count(*) as total FROM bins b left outer join table1 a on a.value between b.min_value and b.max_value group by b.min_value

表格箱包含定义箱的列min_value和max_value。请注意，运算符“连接…在x BETWEEN y和z”是包容性的。

table1是数据表的名称

Ofri Raviv的回答非常接近但不正确。即使在直方图间隔中有零结果， count(*)也将为1 。查询需要修改为使用条件sum ：

 SELECT b.*, SUM(a.value IS NOT NULL) AS total FROM bins b LEFT JOIN a ON a.value BETWEEN b.min_value AND b.max_value GROUP BY b.min_value;

 select "30-34" as TotalRange,count(total) as Count from table_name where total between 30 and 34 union ( select "35-39" as TotalRange,count(total) as Count from table_name where total between 35 and 39) union ( select "40-44" as TotalRange,count(total) as Count from table_name where total between 40 and 44) union ( select "45-49" as TotalRange,count(total) as Count from table_name where total between 45 and 49) etc ....

只要没有太多的间隔，这是一个很好的解决scheme。

我制作了一个程序，可以根据指定的数量或大小自动生成一个临时表格，供以后与Ofri Raviv的解决scheme一起使用。

 CREATE PROCEDURE makebins(numbins INT, binsize FLOAT) # binsize may be NULL for auto-size BEGIN SELECT FLOOR(MIN(colval)) INTO @binmin FROM yourtable; SELECT CEIL(MAX(colval)) INTO @binmax FROM yourtable; IF binsize IS NULL THEN SET binsize = CEIL((@binmax-@binmin)/numbins); # CEIL here may prevent the potential creation a very small extra bin due to rounding errors, but no good where floats are needed. END IF; SET @currlim = @binmin; WHILE @currlim + binsize < @binmax DO INSERT INTO bins VALUES (@currlim, @currlim+binsize); SET @currlim = @currlim + binsize; END WHILE; INSERT INTO bins VALUES (@currlim, @maxbin); END; DROP TABLE IF EXISTS bins; # be careful if you have a bins table of your own. CREATE TEMPORARY TABLE bins ( minval INT, maxval INT, # or FLOAT, if needed KEY (minval), KEY (maxval) );# keys could perhaps help if using a lot of bins; normally negligible CALL makebins(20, NULL); # Using 20 bins of automatic size here. SELECT bins.*, count(*) AS total FROM bins LEFT JOIN yourtable ON yourtable.value BETWEEN bins.minval AND bins.maxval GROUP BY bins.minval

这将仅为填充的分箱生成直方图计数。大卫·韦斯特应该是正确的，但由于某种原因，没有人口的垃圾箱没有出现在我的结果中（尽pipe使用了左连接 – 我不明白为什么）。

这应该工作。不那么优雅，但仍然：

 select count(mycol - (mycol mod 10)) as freq, mycol - (mycol mod 10) as label from mytable group by mycol - (mycol mod 10) order by mycol - (mycol mod 10) ASC

通过Mike DelGaudio

 select case when total >= 30 and total <= 40 THEN "30-40" else when total >= 40 and total <= 50 then "40-50" else "50-60" END as Total , count(total) group by Total

除了很好的回答https://stackoverflow.com/a/10363145/916682 ，你可以使用phpmyadmin图表工具获得一个不错的结果：

在这里输入图像说明

在给定数量的垃圾箱中等宽分箱：

 WITH bins AS( SELECT min(col) AS min_value , ((max(col)-min(col)) / 10.0) + 0.0000001 AS bin_width FROM cars ) SELECT tab.*, floor((col-bins.min_value) / bins.bin_width ) AS bin FROM tab, bins;

请注意，0.0000001用于确保值等于max（col）的logging不会自己创build它自己的bin。此外，加法常量用于确保在列中的所有值都相同时查询不会因零除而失败。

还要注意的是，箱子的数量（例子中的10）应该用十进制标记来避免整数除法（未调整的bin_width可以是十进制）。

获取直方图的数据

Matplotlib /pandas错误使用直方图

在Matplotlib中的Bin大小（直方图）

如何在R中绘制两个直方图？

用于在时间轴上绘制graphics的Javascript库（可缩放和可选）

在R中用ggplot2理解date和绘制直方图

使用gnuplot直方图？

在R中将正态曲线叠加到直方图上

用matplotlib同时绘制两个直方图

numpy.histogram（）如何工作？

保存一个pandas.Series直方图的图档