# GROUP BY和聚合的顺序数值

` `+-----------------------------+ | company | profession | year | +---------+------------+------+ | Google | Programmer | 2000 | | Google | Sales | 2000 | | Google | Sales | 2001 | | Google | Sales | 2002 | | Google | Sales | 2004 | | Mozilla | Sales | 2002 | +-----------------------------+` `

` `+-----------------------------------------+ | company | profession | year | +---------+------------+------------------+ | Google | Programmer | [2000] | | Google | Sales | [2000,2001,2002] | | Google | Sales | [2004] | | Mozilla | Sales | [2002] | +-----------------------------------------+` `

` `WITH MarkedForGrouping AS ( SELECT company, profession, year, year - ROW_NUMBER() OVER ( PARTITION BY company, profession ORDER BY year ) AS seqID FROM atable ) SELECT company, profession, array_agg(year) AS years FROM MarkedForGrouping GROUP BY company, profession, seqID` `

### 步骤1）识别不连续的值

` `select company, profession, year, case when row_number() over (partition by company, profession order by year) = 1 or year - lag(year,1,year) over (partition by company, profession order by year) > 1 then 1 else 0 end as group_cnt from qualification` `

``` 公司| 职业| 年| |  group_cnt
--------- + ------------ + ------ + -----------
Google | 程序员|  2000 |  1
Google | 销售|  2000 |  1
Google | 销售|  2001 |  0
Google | 销售|  2002 |  0
Google | 销售|  2004 |  1
Mozilla | 销售|  2002 |  1
```

### 步骤2）定义组ID

` `select company, profession, year, sum(group_cnt) over (order by company, profession, year) as group_nr from ( select company, profession, year, case when row_number() over (partition by company, profession order by year) = 1 or year - lag(year,1,year) over (partition by company, profession order by year) > 1 then 1 else 0 end as group_cnt from qualification ) t1` `

``` 公司| 职业| 年| |  group_nr
--------- + ------------ + ------ + ----------
Google | 程序员|  2000 |  1
Google | 销售|  2000 |  2
Google | 销售|  2001 |  2
Google | 销售|  2002 |  2
Google | 销售|  2004 |  3
Mozilla | 销售|  2002 |  4
（6行）
```

### 步骤3）最终查询

` `select company, profession, array_agg(year) as years from ( select company, profession, year, sum(group_cnt) over (order by company, profession, year) as group_nr from ( select company, profession, year, case when row_number() over (partition by company, profession order by year) = 1 or year - lag(year,1,year) over (partition by company, profession order by year) > 1 then 1 else 0 end as group_cnt from qualification ) t1 ) t2 group by company, profession, group_nr order by company, profession, group_nr` `

``` 公司| 职业| 年份
--------- + ------------ + ------------------
Mozilla | 销售|  {} 2002
（4行）
```

### 使用PL / pgSQL的程序解决scheme

testing表：

` `CREATE TEMP TABLE tbl (company text, profession text, year int); INSERT INTO tbl VALUES ('Google', 'Programmer', 2000) ,('Google', 'Sales', 2000) ,('Google', 'Sales', 2001) ,('Google', 'Sales', 2002) ,('Google', 'Sales', 2004) ,('Mozilla', 'Sales', 2002);` `

function：

` `CREATE OR REPLACE FUNCTION f_periods() RETURNS TABLE (company text, profession text, years int[]) AS \$func\$ DECLARE r tbl; -- use table type as row variable r0 tbl; BEGIN FOR r IN SELECT * FROM tbl t ORDER BY t.company, t.profession, t.year LOOP IF ( r.company, r.profession, r.year) <> (r0.company, r0.profession, r0.year + 1) THEN -- not true for first row RETURN QUERY SELECT r0.company, r0.profession, years; -- output row years := ARRAY[r.year]; -- start new array ELSE years := years || r.year; -- add to array - year can be NULL, too END IF; r0 := r; -- remember last row END LOOP; RETURN QUERY -- output last iteration SELECT r0.company, r0.profession, years; END \$func\$ LANGUAGE plpgsql;` `

` `SELECT * FROM f_periods();` `