在MySQL中查找重复的logging

我想在MySQL数据库中取出重复的logging。 这可以通过以下方式完成:

SELECT address, count(id) as cnt FROM list GROUP BY address HAVING cnt > 1 

其结果是:

 100 MAIN ST 2 

我想拉它,以便显示每一行是重复的。 就像是:

 JIM JONES 100 MAIN ST JOHN SMITH 100 MAIN ST 

任何想法如何做到这一点? 我试图避免做第一个,然后查找代码中的第二个查询重复。

关键是要重写这个查询,以便它可以用作子查询。

 SELECT firstname, lastname, list.address FROM list INNER JOIN (SELECT address FROM list GROUP BY address HAVING COUNT(id) > 1) dup ON list.address = dup.address; 
 SELECT date FROM logs group by date having count(*) >= 2 

为什么不只是INNER与自己的表join?

 SELECT a.firstname, a.lastname, a.address FROM list a INNER JOIN list b ON a.address = b.address WHERE a.id <> b.id 

如果地址可能存在两次以上,则需要DISTINCT。

我试着为这个问题select最好的答案,但它有点让我困惑。 我实际上只需要从我的桌子上的一个字段。 下面这个链接的例子对我来说非常好:

 SELECT COUNT(*) c,title FROM `data` GROUP BY title HAVING c > 1; 
 select `cityname` from `codcities` group by `cityname` having count(*)>=2 

这是你所要求的类似的查询,也是它的200%的工作,也很容易。 请享用!!!

通过电子邮件地址查找重复的用户与此查询…

 SELECT users.name, users.uid, users.mail, from_unixtime(created) FROM users INNER JOIN ( SELECT mail FROM users GROUP BY mail HAVING count(mail) > 1 ) dupes ON users.mail = dupes.mail ORDER BY users.mail; 

我们可以发现重复取决于更多的一个领域也。对于这些情况下,你可以使用下面的格式。

 SELECT COUNT(*), column1, column2 FROM tablename GROUP BY column1, column2 HAVING COUNT(*)>1; 

另一个解决scheme是使用表别名,如下所示:

 SELECT p1.id, p2.id, p1.address FROM list AS p1, list AS p2 WHERE p1.address = p2.address AND p1.id != p2.id 

在这种情况下,你所做的只是取出原始列表 ,创build两个扩展表( p 1p 2) ,然后在地址列(第3行)上执行一个连接。 第四行确保相同的logging不会在您的结果集中显示多次(“重复副本”)。

find重复的地址比看起来复杂得多,特别是如果你需要准确的话。 在这种情况下MySQL查询是不够的…

我在SmartyStreets工作,我们在那里处理validation和重复数据删除等其他事情,我也看到了类似问题的各种挑战。

有几个第三方服务将在列表中标记重复项。 仅仅通过一个MySQL子查询来做这件事就不会解决地址格式和标准的差异。 美国邮政(美国地址)有一定的准则,使这些标准,但只有less数供应商被authentication执行此类操作。

所以,我build议你最好的答案是将表格导出为CSV文件,并将其提交给有能力的列表处理器。 其中一个就是LiveAddress ,它可以在几秒到几分钟内自动完成。 它将用一个名为“Duplicate”的新字段和一个Y值来标记重复的行。

这将在一个表传递中select重复,而不是子查询。

 SELECT * FROM ( SELECT ao.*, (@r := @r + 1) AS rn FROM ( SELECT @_address := 'N' ) vars, ( SELECT * FROM list a ORDER BY address, id ) ao WHERE CASE WHEN @_address <> address THEN @r := 0 ELSE 0 END IS NOT NULL AND (@_address := address ) IS NOT NULL ) aoo WHERE rn > 1 

这个查询模拟了OracleSQL Server ROW_NUMBER()

有关详细信息,请参阅我的博客文章:

  • 分析函数:SUM,AVG,ROW_NUMBER – 在MySQL模拟。

不会很有效率,但它应该工作:

 SELECT * FROM list AS outer WHERE (SELECT COUNT(*) FROM list AS inner WHERE inner.address = outer.address) > 1; 
  SELECT firstname, lastname, address FROM list WHERE Address in (SELECT address FROM list GROUP BY address HAVING count(*) > 1) 
 select * from table_name t1 inner join (select distinct <attribute list> from table_name as temp)t2 where t1.attribute_name = t2.attribute_name 

对于你的桌子,它会是这样的

 select * from list l1 inner join (select distinct address from list as list2)l2 where l1.address=l2.address 

这个查询将给你列表中的所有不同的地址条目…我不知道如果这有什么工作,如果你有任何主键值名称等。

最快的重复删除查询过程:

 /* create temp table with one primary column id */ INSERT INTO temp(id) SELECT MIN(id) FROM list GROUP BY (isbn) HAVING COUNT(*)>1; DELETE FROM list WHERE id IN (SELECT id FROM temp); DELETE FROM temp; 

个人来说这个查询已经解决了我的问题:

 SELECT `SUB_ID`, COUNT(SRV_KW_ID) as subscriptions FROM `SUB_SUBSCR` group by SUB_ID, SRV_KW_ID HAVING subscriptions > 1; 

该脚本所做的是显示表中存在多次的所有订阅者ID以及find的重复数目。

这是表格列:

 | SUB_SUBSCR_ID | int(11) | NO | PRI | NULL | auto_increment | | MSI_ALIAS | varchar(64) | YES | UNI | NULL | | | SUB_ID | int(11) | NO | MUL | NULL | | | SRV_KW_ID | int(11) | NO | MUL | NULL | | 

希望它也会对你有所帮助!

这也会告诉你有多less重复,并将结果没有连接

 SELECT `Language` , id, COUNT( id ) AS how_many FROM `languages` GROUP BY `Language` HAVING how_many >=2 ORDER BY how_many DESC 
 SELECT t.*,(select count(*) from city as tt where tt.name=t.name) as count FROM `city` as t where (select count(*) from city as tt where tt.name=t.name) > 1 order by count desc 

用你的表replace城市 。 用您的字段名称replace名称

  SELECT * FROM (SELECT address, COUNT(id) AS cnt FROM list GROUP BY address HAVING ( COUNT(id) > 1 )) 

Powerlord的答案确实是最好的,我会推荐一个更改:使用LIMIT来确保数据库不会超载:

 SELECT firstname, lastname, list.address FROM list INNER JOIN (SELECT address FROM list GROUP BY address HAVING count(id) > 1) dup ON list.address = dup.address LIMIT 10 

如果没有WHERE和进行连接,使用LIMIT是个好习惯。 从小值开始,检查查询的重要程度,然后增加限制。

select address from list where address = any (select address from (select address, count(id) cnt from list group by address having cnt > 1 ) as t1) order by address

内部子查询返回具有重复地址的行,然后外部子查询返回具有重复地址的地址列。 外部子查询必须只返回一列,因为它用作操作符'= any'