SQL help to match only 3 values

  • Hello Experts,that
    Needed some help with a query, the requirement is to identify all PID IN TABLE1 that has exactly any 3 unique gcode values (not less not more).
    Any help is greatly appreciated. Below is the test table, data and expected result.

    DECLARE @table1 TABLE (
    pid varchar(5),
    gcode varchar(5)
    )

    DECLARE @table2 TABLE(
    id int,
    gcode varchar(5)
    )

    insert @table1 (pid,gcode)
    values('p1','g1'),
    ('p1','g2'),
    ('p1','g5'),
    ('p2','g3'),
    ('p2','g4'),
    ('p2','g1'),
    ('p3','g1'),
    ('p3','g3'),
    ('p3','g6'),
    ('p3','g1'),
    ('p4','g2'),
    ('p4','g5'),
    ('p5','g3'),
    ('p5','g4'),
    ('p5','g6'),
    ('p6','g2'),
    ('p6','g2')

    insert @table2 (id,gcode)
    values('1','g1'),
    ('2','g2'),
    ('3','g3'),
    ('4','g4'),
    ('5','g5'),
    ('6','g6'),
    ('7','g7')

    Expected result

    pid,gcode

    p2,g3
    p2,g4
    p2,g1
    p5,g3
    p5,g4
    p5,g6

    Thanks

  • Tried HAVING yet?

  • SELECT pid
    FROM @table1
    GROUP BY pid
    HAVING COUNT(DISTINCT gcode) = 3

    I don't understand why you don't have p1 and p3 in your result set.

    SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

  • ScottPletcher - Thursday, March 7, 2019 2:19 PM

    SELECT pid
    FROM @table1
    GROUP BY pid
    HAVING COUNT(DISTINCT gcode) = 3

    I don't understand why you don't have p1 and p3 in your result set.

    While p1 is a valid point, the case of p3 is because p3 has a fourth entry, and thus not JUST 3 unique values.  Let's try this:

    SELECT pid
    FROM @table1
    GROUP BY pid
    HAVING COUNT(DISTINCT gcode) = 3
        AND COUNT(gcode) = 3

    This will still pull p1, of course, so OP, please explain why p1 is not in your result set...

  • Thanks Scott, that was my bad, yes p1 and p3 should be there.  This works.  In real world I am going against millions of records with lot of columns in each table, hopefully i won't have any performance issues.

    Thanks

  • It looked you also wanted to list the specific values found.

    That's easy for two values, much trickier for 3 (3 numeric values is easy, but not 3 char values). 

    But if the gcode values really are very short, we can probably derive the middle value by using a formula.

    SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

  • OH yes, the gcode is actually 14 character value (for testing I minimized it).  Is there a way.

    Also, I have a requirement for 2 as well on the same set.

  • For 3 varchar values, it's tricky, and I'd have to do some testing/experimenting first. 

    For 3 numeric values, it's easy to list all 3.

    For only 2 values of either type, it's super easy:

    SELECT pid, MIN(gcode) AS gcode_min, MAX(gcode) AS gcode_max
    FROM @table1
    GROUP BY pid
    HAVING COUNT(DISTINCT gcode) = 2 /*AND COUNT(*) = 2 /*if applicable*/*/

    SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

  • Ha! Sounds good!

    Thank you very much!

  • This will work with most (all?) data types.  I used DENSE_RANK() to filter out duplicates, but ROW_NUMBER() would work in a similar fashion.

    WITH CTE AS
    (
        SELECT pid, gcode, DENSE_RANK() OVER(PARTITION BY pid ORDER BY gcode) + DENSE_RANK() OVER(PARTITION BY pid ORDER BY gcode DESC) AS cnt
        FROM @table1
    )
    SELECT pid, gcode
    FROM CTE
    WHERE cnt = 4
    ORDER BY pid, gcode

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply