在Postgresql中统计累计总数

我使用countgroup by来获取每天注册的用户数量:

  SELECT created_at, COUNT(email) FROM subscriptions GROUP BY created at; 

结果:

 created_at count ----------------- 04-04-2011 100 05-04-2011 50 06-04-2011 50 07-04-2011 300 

我想每天都得到用户的累计总数。 我如何得到这个?

 created_at count ----------------- 04-04-2011 100 05-04-2011 150 06-04-2011 200 07-04-2011 500 

对于较大的数据集, 窗口函数是执行这些查询的最有效的方法 – 表格将只被扫描一次,而不是每个date一次,就像自连接一样。 它也看起来更简单。 🙂 PostgreSQL 8.4及以上版本支持窗口function。

这是它的样子:

 SELECT created_at, sum(count(email)) OVER (ORDER BY created_at) FROM subscriptions GROUP BY created_at; 

这里OVER创build窗口; ORDER BY created_at意味着它必须总结在created_at顺序的计数。


编辑:如果你想在一天内删除重复的电子邮件,你可以使用sum(count(distinct email)) 。 不幸的是,这不会删除跨越不同date的重复。

如果你想删除所有重复,我认为最简单的是使用子查询和DISTINCT ON 。 这将电子邮件属性的最早的date(因为我按升序sortingcreated_at,它会select最早的一个):

 SELECT created_at, sum(count(email)) OVER (ORDER BY created_at) FROM ( SELECT DISTINCT ON (email) created_at, email FROM subscriptions ORDER BY email, created_at ) AS subq GROUP BY created_at; 

如果你在(email, created_at)上创build一个索引,这个查询不应该太慢。


(如果你想testing,这是我如何创build示例数据集)

 create table subscriptions as select date '2000-04-04' + (i/10000)::int as created_at, 'foofoobar@foobar.com' || (i%700000)::text as email from generate_series(1,1000000) i; create index on subscriptions (email, created_at); 

使用:

 SELECT a.created_at, (SELECT COUNT(b.email) FROM SUBSCRIPTIONS b WHERE b.created_at <= a.created_at) AS count FROM SUBSCRIPTIONS a 
 SELECT s1.created_at, COUNT(s2.email) AS cumul_count FROM subscriptions s1 INNER JOIN subscriptions s2 ON s1.created_at >= s2.created_at GROUP BY s1.created_at 

我假设你每天只需要一行,而且你还想显示没有任何订阅的日子(假设没有人订阅某个date,那么你是否想在前一天的余额中显示该date?)。 如果是这种情况,可以使用'with'function:

 with recursive serialdates(adate) as ( select cast('2011-04-04' as date) union all select adate + 1 from serialdates where adate < cast('2011-04-07' as date) ) select D.adate, ( select count(distinct email) from subscriptions where created_at between date_trunc('month', D.adate) and D.adate ) from serialdates D 

最好的方法是有一个日历表:日历(datedate,月份int,季度int,一半int,一周int,一年int)

然后,你可以join这个表格来为你需要的领域做总结。