在GROUP BY中使用LIMIT来获得每个组的N个结果?

以下查询:

SELECT year, id, rate FROM h WHERE year BETWEEN 2000 AND 2009 AND id IN (SELECT rid FROM table2) GROUP BY id, year ORDER BY id, rate DESC 

收益率:

 year id rate 2006 p01 8 2003 p01 7.4 2008 p01 6.8 2001 p01 5.9 2007 p01 5.3 2009 p01 4.4 2002 p01 3.9 2004 p01 3.5 2005 p01 2.1 2000 p01 0.8 2001 p02 12.5 2004 p02 12.4 2002 p02 12.2 2003 p02 10.3 2000 p02 8.7 2006 p02 4.6 2007 p02 3.3 

我想只是每个ID的前5名结果:

 2006 p01 8 2003 p01 7.4 2008 p01 6.8 2001 p01 5.9 2007 p01 5.3 2001 p02 12.5 2004 p02 12.4 2002 p02 12.2 2003 p02 10.3 2000 p02 8.7 

有没有办法做到这一点使用某种LIMIT像修饰符在GROUP BY中工作?

您可以使用GROUP_CONCAT聚合函数将所有年份放入单个列中,按id分组并按ratesorting:

 SELECT id, GROUP_CONCAT(year ORDER BY rate DESC) grouped_year FROM yourtable GROUP BY id 

结果:

 ----------------------------------------------------------- | ID | GROUPED_YEAR | ----------------------------------------------------------- | p01 | 2006,2003,2008,2001,2007,2009,2002,2004,2005,2000 | | p02 | 2001,2004,2002,2003,2000,2006,2007 | ----------------------------------------------------------- 

然后你可以使用FIND_IN_SET ,返回第二个参数的第一个参数的位置,例如。

 SELECT FIND_IN_SET('2006', '2006,2003,2008,2001,2007,2009,2002,2004,2005,2000'); 1 SELECT FIND_IN_SET('2009', '2006,2003,2008,2001,2007,2009,2002,2004,2005,2000'); 6 

使用GROUP_CONCATFIND_IN_SET的组合,然后按find_in_set返回的位置进行筛选,然后可以使用此查询返回每个id的前5年:

 SELECT yourtable.* FROM yourtable INNER JOIN ( SELECT id, GROUP_CONCAT(year ORDER BY rate DESC) grouped_year FROM yourtable GROUP BY id) group_max ON yourtable.id = group_max.id AND FIND_IN_SET(year, grouped_year) BETWEEN 1 AND 5 ORDER BY yourtable.id, yourtable.year DESC; 

请看这里的小提琴。

请注意,如果多个行可以具有相同的汇率,则应考虑在汇率列而不是年份列上使用GROUP_CONCAT(DISTINCT汇率ORDER BY汇率)。

由GROUP_CONCAT返回的string的最大长度是有限的,所以如果你需要为每个组select几条logging的话,这个效果很好。

一个技巧是使用用户variables对每个组中的行进行编号。 从这个查询开始:

 SET @currcount = NULL, @currvalue = NULL; SELECT id, year, rate, @currcount := IF(@currvalue = id, @currcount + 1, 1) AS rank, @currvalue := id AS whatever FROM test ORDER BY id, rate DESC 

这给你以下结果:

 +------+------+-------+------+----------+ | id | year | rate | rank | whatever | +------+------+-------+------+----------+ | p01 | 2006 | 8.00 | 1 | p01 | | p01 | 2003 | 7.40 | 2 | p01 | | p01 | 2008 | 6.80 | 3 | p01 | | p01 | 2001 | 5.90 | 4 | p01 | | p01 | 2007 | 5.30 | 5 | p01 | | p01 | 2009 | 4.40 | 6 | p01 | | p01 | 2002 | 3.90 | 7 | p01 | | p01 | 2004 | 3.50 | 8 | p01 | | p01 | 2005 | 2.10 | 9 | p01 | | p01 | 2000 | 0.80 | 10 | p01 | | p02 | 2001 | 12.50 | 1 | p02 | | p02 | 2004 | 12.40 | 2 | p02 | | p02 | 2002 | 12.20 | 3 | p02 | | p02 | 2003 | 10.30 | 4 | p02 | | p02 | 2000 | 8.70 | 5 | p02 | | p02 | 2006 | 4.60 | 6 | p02 | | p02 | 2007 | 3.30 | 7 | p02 | +------+------+-------+------+----------+ 

现在将查询包装在另一个查询中,并过滤rank <= 5的结果:

 SET @currcount = NULL, @currvalue = NULL; SELECT id, year, rate FROM ( SELECT id, year, rate, @currcount := IF(@currvalue = id, @currcount + 1, 1) AS rank, @currvalue := id AS whatever FROM test ORDER BY id, rate DESC ) AS whatever WHERE rank <= 5 

你有:

 +------+------+-------+ | id | year | rate | +------+------+-------+ | p01 | 2006 | 8.00 | | p01 | 2003 | 7.40 | | p01 | 2008 | 6.80 | | p01 | 2001 | 5.90 | | p01 | 2007 | 5.30 | | p02 | 2001 | 12.50 | | p02 | 2004 | 12.40 | | p02 | 2002 | 12.20 | | p02 | 2003 | 10.30 | | p02 | 2000 | 8.70 | +------+------+-------+ 

对我来说就像

 SUBSTRING_INDEX(group_concat(col_name order by desired_col_order_name), ',', N) 

完美的作品。 没有复杂的查询。


例如:每组获得前1名

 SELECT * FROM yourtable WHERE id IN (SELECT SUBSTRING_INDEX(GROUP_CONCAT(id ORDER BY rate DESC), ',', 1) id FROM yourtable GROUP BY year) ORDER BY rate DESC; 

不,你不能任意地限制子查询(你可以在较新的MySQL中做到有限的程度,但不是每组5个结果)。

这是一个groupwise-maximumtypes的查询,这在SQL中并不重要。 在某些情况下,有多种方法可以解决这个问题,但是对于一般情况下的top-n,你会想看看Bill对于类似的问题的回答。

就像这个问题的大多数解决scheme一样,如果有多个具有相同rate值的行,它可以返回多于五行,所以您可能仍然需要一些后处理来检查这个行。

尝试这个:

 SELECT h.year, h.id, h.rate FROM (SELECT h.year, h.id, h.rate, IF(@lastid = (@lastid:=h.id), @index:=@index+1, @index:=0) indx FROM (SELECT h.year, h.id, h.rate FROM h WHERE h.year BETWEEN 2000 AND 2009 AND id IN (SELECT rid FROM table2) GROUP BY id, h.year ORDER BY id, rate DESC ) h, (SELECT @lastid:='', @index:=0) AS a ) h WHERE h.indx <= 5; 

这需要一系列子查询来对这些值进行sorting,限制它们,然后在分组时进行总和

 @Rnk:=0; @N:=2; select c.id, sum(c.val) from ( select b.id, b.bal from ( select if(@last_id=id,@Rnk+1,1) as Rnk, a.id, a.val, @last_id=id, from ( select id, val from list order by id,val desc) as a) as b where b.rnk < @N) as c group by c.id; 

构build虚拟列(如Oracle中的RowID)

表:

 ` CREATE TABLE `stack` (`year` int(11) DEFAULT NULL, `id` varchar(10) DEFAULT NULL, `rate` float DEFAULT NULL) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 ` 

数据:

 insert into stack values(2006,'p01',8); insert into stack values(2001,'p01',5.9); insert into stack values(2007,'p01',5.3); insert into stack values(2009,'p01',4.4); insert into stack values(2001,'p02',12.5); insert into stack values(2004,'p02',12.4); insert into stack values(2005,'p01',2.1); insert into stack values(2000,'p01',0.8); insert into stack values(2002,'p02',12.2); insert into stack values(2002,'p01',3.9); insert into stack values(2004,'p01',3.5); insert into stack values(2003,'p02',10.3); insert into stack values(2000,'p02',8.7); insert into stack values(2006,'p02',4.6); insert into stack values(2007,'p02',3.3); insert into stack values(2003,'p01',7.4); insert into stack values(2008,'p01',6.8); 

像这样的SQL:

 select t3.year,t3.id,t3.rate from (select t1.*, (select count(*) from stack t2 where t1.rate<=t2.rate and t1.id=t2.id) as rownum from stack t1) t3 where rownum <=3 order by id,rate DESC; 

如果删除t3中的where子句,则显示如下:

在这里输入图像描述

GET“TOP N Record” – >在where子句(t3的where子句)中添加“rownum <= 3”;

select“年” – >在where子句(t3的where子句)中添加“BETWEEN 2000和2009”;

以下文章: sql:select每个组的最高N条logging描述了无子查询的复杂方式。

它改进了这里提供的其他解决scheme:

  • 在单个查询中做所有事情
  • 能够正确使用索引
  • 避免出现在MySQL中出现错误的执行计划而出名的子查询

然而这并不漂亮。 一个好的解决scheme是可以实现的窗口函数(又名分析函数)在MySQL中启用 – 但它们不是。 在这篇文章中使用的技巧利用了GROUP_CONCAT,有时候被描述为“穷人的MySQL的窗口函数”。

做了一些工作,但我认为我的解决scheme将是分享的东西,因为它似乎优雅,以及相当快。

 SELECT h.year, h.id, h.rate FROM ( SELECT id, SUBSTRING_INDEX(GROUP_CONCAT(CONCAT(id, '-', year) ORDER BY rate DESC), ',' , 5) AS l FROM h WHERE year BETWEEN 2000 AND 2009 GROUP BY id ORDER BY id ) AS h_temp LEFT JOIN h ON h.id = h_temp.id AND SUBSTRING_INDEX(h_temp.l, CONCAT(h.id, '-', h.year), 1) != h_temp.l 

请注意,这个例子是为了这个问题的目的而指定的,为了其他类似的目的,可以很容易地修改这个例子。

 SELECT year, id, rate FROM (SELECT year, id, rate, row_number() over (partition by id order by rate DESC) FROM h WHERE year BETWEEN 2000 AND 2009 AND id IN (SELECT rid FROM table2) GROUP BY id, year ORDER BY id, rate DESC) as subquery WHERE row_number <= 5 

子查询与您的查询几乎完全相同。 只有改变是增加

 row_number() over (partition by id order by rate DESC) 

对于那些查询超时的我来说。 我做了下面的使用限制和一个特定的组别。

 DELIMITER $$ CREATE PROCEDURE count_limit200() BEGIN DECLARE a INT Default 0; DECLARE stop_loop INT Default 0; DECLARE domain_val VARCHAR(250); DECLARE domain_list CURSOR FOR SELECT DISTINCT domain FROM db.one; OPEN domain_list; SELECT COUNT(DISTINCT(domain)) INTO stop_loop FROM db.one; -- BEGIN LOOP loop_thru_domains: LOOP FETCH domain_list INTO domain_val; SET a=a+1; INSERT INTO db.two(book,artist,title,title_count,last_updated) SELECT * FROM ( SELECT book,artist,title,COUNT(ObjectKey) AS titleCount, NOW() FROM db.one WHERE book = domain_val GROUP BY artist,title ORDER BY book,titleCount DESC LIMIT 200 ) a ON DUPLICATE KEY UPDATE title_count = titleCount, last_updated = NOW(); IF a = stop_loop THEN LEAVE loop_thru_domain; END IF; END LOOP loop_thru_domain; END $$ 

它通过一个域列表循环,然后插入每个200的限制

尝试这个:

 SET @num := 0, @type := ''; SELECT `year`, `id`, `rate`, @num := if(@type = `id`, @num + 1, 1) AS `row_number`, @type := `id` AS `dummy` FROM ( SELECT * FROM `h` WHERE ( `year` BETWEEN '2000' AND '2009' AND `id` IN (SELECT `rid` FROM `table2`) AS `temp_rid` ) ORDER BY `id` ) AS `temph` GROUP BY `year`, `id`, `rate` HAVING `row_number`<='5' ORDER BY `id`, `rate DESC; 

请尝试下面的存储过程。 我已经validation。 我得到正确的结果,但没有使用groupby

 CREATE DEFINER=`ks_root`@`%` PROCEDURE `first_five_record_per_id`() BEGIN DECLARE query_string text; DECLARE datasource1 varchar(24); DECLARE done INT DEFAULT 0; DECLARE tenants varchar(50); DECLARE cur1 CURSOR FOR SELECT rid FROM demo1; DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1; SET @query_string=''; OPEN cur1; read_loop: LOOP FETCH cur1 INTO tenants ; IF done THEN LEAVE read_loop; END IF; SET @datasource1 = tenants; SET @query_string = concat(@query_string,'(select * from demo where `id` = ''',@datasource1,''' order by rate desc LIMIT 5) UNION ALL '); END LOOP; close cur1; SET @query_string = TRIM(TRAILING 'UNION ALL' FROM TRIM(@query_string)); select @query_string; PREPARE stmt FROM @query_string; EXECUTE stmt; DEALLOCATE PREPARE stmt; END