How to list all duplicated rows which may include NULL columns?

  • Hi guys,

    I have a problem of listing duplicated rows that include NULL columns. Lemme show my problem first.

    USE [tempdb];

    GO

    IF OBJECT_ID(N'dbo.t') IS NOT NULL

    BEGIN

    DROP TABLE dbo.t

    END

    GO

    CREATE TABLE dbo.t

    (

    a NVARCHAR(8),

    b NVARCHAR(8)

    );

    GO

    INSERT t VALUES ('a', 'b');

    INSERT t VALUES ('a', 'b');

    INSERT t VALUES ('a', 'b');

    INSERT t VALUES ('c', 'd');

    INSERT t VALUES ('c', 'd');

    INSERT t VALUES ('c', 'd');

    INSERT t VALUES ('c', 'd');

    INSERT t VALUES ('e', NULL);

    INSERT t VALUES (NULL, NULL);

    INSERT t VALUES (NULL, NULL);

    INSERT t VALUES (NULL, NULL);

    INSERT t VALUES (NULL, NULL);

    GO

    Now I want to show all rows that have other rows duplicated with them, I use the following query.

    SELECT a, b

    FROM dbo.t

    GROUP

    BY a, b

    HAVING count(*) > 1

    which will give us the result:

    a b

    -------- --------

    NULL NULL

    a b

    c d

    Now if I want to list all rows that make contribution to duplication, I use this query:

    WITH

    duplicate (a, b) AS

    (

    SELECT a, b

    FROM dbo.t

    GROUP

    BY a, b

    HAVING count(*) > 1

    )

    SELECT dbo.t.a, dbo.t.b

    FROM dbo.t

    INNER JOIN duplicate

    ON (dbo.t.a = duplicate.a

    AND dbo.t.b = duplicate.b)

    Which will give me the result:

    a b

    -------- --------

    a b

    a b

    a b

    c d

    c d

    c d

    c d

    As you can see, all rows include NULLs are filtered. The reason I thought is that I use equal sign to test the condition(dbo.t.a = duplicate.a AND dbo.t.b = duplicate.b), and NULLs cannot be compared use equal sign. So, in order to include rows that include NULLs in it in the last result, I have change the aforementioned query to

    WITH

    duplicate (a, b) AS

    (

    SELECT a, b

    FROM dbo.t

    GROUP

    BY a, b

    HAVING count(*) > 1

    )

    SELECT dbo.t.a, dbo.t.b

    FROM dbo.t

    INNER JOIN duplicate

    ON (dbo.t.a = duplicate.a

    AND dbo.t.b = duplicate.b)

    OR

    (dbo.t.a IS NULL

    AND duplicate.a IS NULL

    AND dbo.t.b = duplicate.b)

    OR

    (dbo.t.b IS NULL

    AND duplicate.b IS NULL

    AND dbo.t.a = duplicate.a)

    OR

    (dbo.t.a IS NULL

    AND duplicate.a IS NULL

    AND dbo.t.b IS NULL

    AND duplicate.b IS NULL)

    And this query will give me the answer as I wanted:

    a b

    -------- --------

    NULL NULL

    NULL NULL

    NULL NULL

    NULL NULL

    a b

    a b

    a b

    c d

    c d

    c d

    c d

    Now my question is, as you can see, this query just include two columns, in order to include NULLs in the last result, you have to use many condition testing statements in the query. As the column number increasing, the condition testing statements you need in your query is increasing astonishingly. How can I solve this problem?

    Great thanks.

  • Hey there, try this:

    ;WITH

    duplicate (a, b,c) AS

    (

    SELECT isnull(a, '*') a , isnull(b, '*') b , count(*) c

    FROM dbo.t

    GROUP

    BY a, b

    )

    SELECT t.a, t.b

    FROM dbo.t t

    left join duplicate d

    on isnull(t.a, '*') = d.a and isnull(t.b, '*') = d.b

    where d.c > 1

    Is this what you wanted?

  • WITH CTE AS (

    SELECT a, b, COUNT(*) OVER(PARTITION BY a,b) AS cn

    FROM dbo.t)

    SELECT a,b

    FROM CTE

    WHERE cn>1;

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • You guys are all genius!

    Thanks. Your solution both works like a charm.:-D

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply