How to list all duplicated rows which may include NULL columns?

Question

Post reply

How to list all duplicated rows which may include NULL columns?

ogrishman

SSC Enthusiast

Points: 144
More actions
June 9, 2010 at 11:52 pm

#387121

Hi guys,

I have a problem of listing duplicated rows that include NULL columns. Lemme show my problem first.

USE [tempdb];

GO

IF OBJECT_ID(N'dbo.t') IS NOT NULL

BEGIN

DROP TABLE dbo.t

END

GO

CREATE TABLE dbo.t

(

a NVARCHAR(8),

b NVARCHAR(8)

);

GO

INSERT t VALUES ('a', 'b');

INSERT t VALUES ('a', 'b');

INSERT t VALUES ('a', 'b');

INSERT t VALUES ('c', 'd');

INSERT t VALUES ('c', 'd');

INSERT t VALUES ('c', 'd');

INSERT t VALUES ('c', 'd');

INSERT t VALUES ('e', NULL);

INSERT t VALUES (NULL, NULL);

INSERT t VALUES (NULL, NULL);

INSERT t VALUES (NULL, NULL);

INSERT t VALUES (NULL, NULL);

GO

Now I want to show all rows that have other rows duplicated with them, I use the following query.

SELECT a, b

FROM dbo.t

GROUP

BY a, b

HAVING count(*) > 1

which will give us the result:

a b

-------- --------

NULL NULL

a b

c d

Now if I want to list all rows that make contribution to duplication, I use this query:

WITH

duplicate (a, b) AS

(

SELECT a, b

FROM dbo.t

GROUP

BY a, b

HAVING count(*) > 1

)

SELECT dbo.t.a, dbo.t.b

FROM dbo.t

INNER JOIN duplicate

ON (dbo.t.a = duplicate.a

AND dbo.t.b = duplicate.b)

Which will give me the result:

a b

-------- --------

a b

a b

a b

c d

c d

c d

c d

As you can see, all rows include NULLs are filtered. The reason I thought is that I use equal sign to test the condition(dbo.t.a = duplicate.a AND dbo.t.b = duplicate.b), and NULLs cannot be compared use equal sign. So, in order to include rows that include NULLs in it in the last result, I have change the aforementioned query to

WITH

duplicate (a, b) AS

(

SELECT a, b

FROM dbo.t

GROUP

BY a, b

HAVING count(*) > 1

)

SELECT dbo.t.a, dbo.t.b

FROM dbo.t

INNER JOIN duplicate

ON (dbo.t.a = duplicate.a

AND dbo.t.b = duplicate.b)

OR

(dbo.t.a IS NULL

AND duplicate.a IS NULL

AND dbo.t.b = duplicate.b)

OR

(dbo.t.b IS NULL

AND duplicate.b IS NULL

AND dbo.t.a = duplicate.a)

OR

(dbo.t.a IS NULL

AND duplicate.a IS NULL

AND dbo.t.b IS NULL

AND duplicate.b IS NULL)

And this query will give me the answer as I wanted:

a b

-------- --------

NULL NULL

NULL NULL

NULL NULL

NULL NULL

a b

a b

a b

c d

c d

c d

c d

Now my question is, as you can see, this query just include two columns, in order to include NULLs in the last result, you have to use many condition testing statements in the query. As the column number increasing, the condition testing statements you need in your query is increasing astonishingly. How can I solve this problem?

Great thanks.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply

ColdCoffee SSC-Dedicated Points: 39971 More actions · Answer 1

ColdCoffee

SSC-Dedicated

Points: 39971

June 10, 2010 at 12:43 am

#1178458

Hey there, try this:

;WITH



duplicate (a, b,c) AS

(

	SELECT isnull(a, '*') a  , isnull(b, '*') b , count(*) c

	FROM dbo.t

	GROUP

	BY a, b

)

SELECT t.a, t.b

FROM dbo.t t

left join  duplicate d

on isnull(t.a, '*') = d.a and isnull(t.b, '*') = d.b

   where d.c > 1

Is this what you wanted?

Mark Cowne One Orange Chip Points: 26865 More actions · Answer 2

WITH CTE AS (

SELECT a, b, COUNT(*) OVER(PARTITION BY a,b) AS cn

FROM dbo.t)

SELECT a,b

FROM CTE

WHERE cn>1;

____________________________________________________

Deja View - The strange feeling that somewhere, sometime you've optimised this query before

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537

ogrishman SSC Enthusiast Points: 144 More actions · Answer 3

ogrishman

SSC Enthusiast

Points: 144

June 10, 2010 at 6:56 am

#1178571

You guys are all genius!

Thanks. Your solution both works like a charm.:-D