在使用SQL Server的表中查找重复的logging

我正在validation具有电子商务网站的交易级别数据的表格,并find确切的错误。

我希望您的帮助能够在SQL Server的50列表中find重复的logging。

假设我的数据是:

OrderNo shoppername amountpayed city Item 1 Sam 10 A Iphone 1 Sam 10 A Iphone--->>Duplication to be detected 1 Sam 5 A Ipod 2 John 20 B Macbook 3 John 25 B Macbookair 4 Jack 5 A Ipod 

假设我使用下面的查询:

 Select shoppername,count(*) as cnt from dbo.sales having count(*) > 1 group by shoppername 

会回报我

 Sam 2 John 2 

但我不想find超过1或2列的重复。 我想在我的数据中find所有列的重复。 我想要的结果是:

 1 Sam 10 A Iphone 
 with x as (select *,rn = row_number() over(PARTITION BY OrderNo,item order by OrderNo) from #temp1) select * from x where rn > 1 

你可以通过replaceselect语句来删除重复项

 delete x where rn > 1 
 SELECT OrderNo, shoppername, amountPayed, city, item, count(*) as cnt FROM dbo.sales GROUP BY OrderNo, shoppername, amountPayed, city, item HAVING COUNT(*) > 1 
 SQL> SELECT JOB,COUNT(JOB) FROM EMP GROUP BY JOB; JOB COUNT(JOB) --------- ---------- ANALYST 2 CLERK 4 MANAGER 3 PRESIDENT 1 SALESMAN 4 

只需将所有字段添加到查询中,并记住将它们添加到分组依据。

 Select shoppername, a, b, amountpayed, item, count(*) as cnt from dbo.sales group by shoppername, a, b, amountpayed, item having count(*) > 1 

要获取多条logging的列表,请使用以下命令

 select field1,field2,field3, count(*) from table_name group by field1,field2,field3 having count(*) > 1 

试试这个

 SELECT MAX(shoppername), COUNT(*) AS cnt FROM dbo.sales GROUP BY CHECKSUM(*) HAVING COUNT(*) > 1 

首先阅读CHECKSUM函数,因为可能有重复。

 with x as ( select shoppername,count(shoppername) from sales having count(shoppername)>1 group by shoppername) select t.* from x,win_gp_pin1510 t where x.shoppername=t.shoppername order by t.shoppername 

首先,我怀疑这个结果不准确吗? 好像从原来的桌子上有三个“山姆”。 但这个问题并不重要。

那么我们来这个问题本身。 根据您的表格,显示重复值的最佳方法是使用count(*)Group by子句。 查询将如下所示

SELECT OrderNo, shoppername, amountPayed, city, item, count(*) as RepeatTimes FROM dbo.sales GROUP BY OrderNo, shoppername, amountPayed, city, item HAVING COUNT(*) > 1

原因是表中的所有列唯一标识了每个logging,这意味着只有当每列的所有值完全相同时,logging才会被认为是重复的,同时您也希望显示所有字段的重复logging,所以group by不会错过任何列,否则是的,因为你只能select参加'group by'条款的列。

现在我想给你任何例子With...Row_Number()Over(...) ,它是与Row_Number函数一起使用表格expression式。

假设你有一个几乎相同的表,但有一个额外的列称为发货date ,价值可能会改变,即使其余的是相同的。 这里是:

OrderNo shoppername amountpayed city Item Shipping Date
1 Sam 10 A Iphone 2016-01-01 1 Sam 10 A Iphone 2016-02-02 1 Sam 5 A Ipod 2016-03-03 2 John 20 B Macbook 2016-04-04 3 John 25 B Macbookair 2016-05-05 4 Jack 5 A Ipod 2016-06-06

请注意,如果您仍将所有列作为一个单元,则第2行不是重复的。 但是如果你想在这种情况下把它们看作是重复的呢? 您应该使用With...Row_Number()Over(...) ,查询如下所示:

WITH TABLEEXPRESSION AS (SELECT *,ROW_NUMBER() OVER (PARTITION BY OrderNo, shoppername, amountPayed, city, item ORDER BY [Shipping Date] as Identifier) --if you consider the one with late shipping date as the duplicate FROM dbo.sales) SELECT * FROM TABLEEXPRESSION WHERE Identifier !=1 --or use '>1'

上面的查询会给出发货date的结果,例如:

OrderNo shoppername amountpayed city Item Shipping Date Identifier 1 Sam 10 A Iphone 2016-02-02 2

注意这个和2016-01-01不一样,2016-02-02被过滤的原因是PARTITION BY OrderNo, shoppername, amountPayed, city, item ORDER BY [Shipping Date] as Identifier ,发货date不是需要照pipe重复logging的栏目之一,这意味着与2016-02-02的一个仍然可以是一个完美的结果为您的问题。

现在总结一下,使用count(*)Group by子句是最好的select,当你只想显示来自Group by子句的所有列作为结果时,否则你会错过不参与group by的列。

虽然对于With...Row_Number()Over(...) ,它适用于你想要查找重复logging的每种情况,但是,将查询编写起来稍微复杂一些,一。

如果你的目的是从表中删除重复logging,你必须使用后面的WITH...ROW_NUMBER()OVER(...)...DELETE FROM...WHERE之一。

希望这可以帮助!

尝试这个

 with T1 AS ( SELECT LASTNAME, COUNT(1) AS 'COUNT' FROM Employees GROUP BY LastName HAVING COUNT(1) > 1 ) SELECT E.*,T1.[COUNT] FROM Employees E INNER JOIN T1 ON T1.LastName = E.LastName 

select* from dbo.sales group by shoppername having(count(Item)> 1)

通过count( )> 1的EventID从dbo.EventInstances组中selectEventID,count( )作为cnt

以下是正在运行的代码:

 SELECT abnno, COUNT(abnno) FROM tbl_Name GROUP BY abnno HAVING ( COUNT(abnno) > 1 )