select records where combination of two values are in subquery result

  • Hello all,

    I need some help putting together a query.

    I have some data in the following format;

    MYTABLE

    DOC_NO // REV_NO // FILE_NAME

    ABC123 // A // abc123.pdf

    ABC123 // B // abc123_2.docx

    ABC124 // A // abc124.xlsx

    ABC124 // A // -

    ABC125 // A // abc125.docx

    ABC125 // C // abc125.jpg

    ABC125 // C // abc125.docx

    ABC125 // C // -

    ABC126 // 0 // -

    ABC127 // A1 // abc127.xlsx

    ABC127 // A1 // abc127.pdf

    I'm looking to select all rows where the DOC_NO and REV_NO appear only once.

    (i.e. the combination of the two values together, not any distinct value in a column)

    I have written the sub query to filter the correct results;

    SELECT DOC_NO, REV_NO FROM [MYTABLE]

    GROUP BY DOC_NO, REV_NO

    HAVING COUNT(*) =1

    I now need to strip out the records which have no file (represented as "-" in the FILE_NAME field) and select the other fields (same table - for example, lets just say "ADD1", "ADD2" and "ADD3")

    I was looking to put together a query like;

    SELECT DOC_NO, REV_NO, FILE_NAME, ADD1, ADD2, ADD3 FROM [MYTABLE]

    WHERE FILE_NAME NOT LIKE '-' AND DOC_NO IN

    (SELECT DOC_NO, REV_NO FROM [MYTABLE]

    GROUP BY DOC_NO, REV_NO

    HAVING COUNT(*) =1)

    But of course, DOC_NO alone being in the subquery select is not sufficient, as (ABC125 /A) is a unique combination, but (ABC125 /C) is not, but these results would be pulled in.

    I also cannot simply add an additional "AND" clause on its own to make sure the REV_NO value appears in the subquery, because it is highly repetitive and would have to specifically match the DOC_NO)

    What is the easiest way of ensuring that I only pull in the records where both the DOC_NO and REV_NO (combination) are unique, or is there a better way of putting this select together altogether?

    Many thanks in advance for any assistance.

  • I have about 4 ideas of what you might want from this. Can you put together some ddl and sample data in a consumable format? Also, if you can demonstrate what the desired output is based on your sample data it would greatly help. Please take a few minutes and read the first link in my signature for best practices when posting questions.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • CREATE TABLE MYTABLE

    (

    DOC_NO NVARCHAR (10),

    REV_NO NVARCHAR (5),

    FILE_NAME NVARCHAR (20),

    ADD1 NVARCHAR (10),

    ADD2 NVARCHAR (10),

    ADD3 INT

    )

    INSERT INTO MYTABLE (DOC_NO, REV_NO, FILE_NAME, ADD1, ADD2, ADD3)

    SELECT 'ABC123', 'A', 'abc123.pdf', 'CON1', 'LOC1', 1 UNION ALL

    SELECT 'ABC123', 'B', 'abc123_2.docx', 'CON1', 'LOC2', 2 UNION ALL

    SELECT 'ABC124', 'A', 'abc124.xlsx', 'CON2', 'LOC1', 9 UNION ALL

    SELECT 'ABC124', 'A', '-', 'CON2', 'LOC3', 4 UNION ALL

    SELECT 'ABC125', 'A', 'abc125.docx', 'CON1', 'LOC1', 6 UNION ALL

    SELECT 'ABC125', 'C', 'abc125.jpg', 'CON1', 'LOC4', 1 UNION ALL

    SELECT 'ABC125', 'C', 'abc125.docx', 'CON3', 'LOC7', 2 UNION ALL

    SELECT 'ABC125', 'C', '-', 'CON3', 'LOC1', 4 UNION ALL

    SELECT 'ABC126', '0', '-', 'CON7', 'LOC2', 5 UNION ALL

    SELECT 'ABC127', 'A1', 'abc127.xlsx', 'CON1', 'LOC1', 7 UNION ALL

    SELECT 'ABC127', 'A1', 'abc127.pdf', 'CON1', 'LOC1', 3

    And the output:

    DOC_NO REV_NO FILE_NAME ADD1 ADD2 ADD3

    ABC123 A abc123.pdf CON1 LOC1 1 ----> OK (UNIQUE DOC_NO + REV_NO COMBINATION + FILE ENTRY)

    ABC123 B abc123_2.docx CON1 LOC2 ----> OK (UNIQUE DOC_NO + REV_NO COMBINATION + FILE ENTRY)

    ABC124 A abc124.xlsx CON2 LOC1 9 ---> FILTERED OUT (DUPLICATE DOC_NO + REV_NO COMBO)

    ABC124 A - CON2 LOC3 4 ---> FILTERED OUT (DUPLICATE DOC_NO + REV_NO COMBO // ALSO NO FILE)

    ABC125 A abc125.docx CON1 LOC1 6 ---> FILTERED OUT (DUPLICATE DOC_NO + REV_NO COMBO)

    ABC125 C abc125.jpg CON1 LOC4 1 ---> FILTERED OUT (DUPLICATE DOC_NO + REV_NO COMBO)

    ABC125 C abc125.docx CON3 LOC7 2 ---> FILTERED OUT (DUPLICATE DOC_NO + REV_NO COMBO)

    ABC125 C - CON3 LOC1 4 ---> FILTERED OUT (DUPLICATE DOC_NO + REV_NO COMBO // ALSO NO FILE)

    ABC126 0 - CON7 LOC2 5 ---> FILTERED OUT (NO FILE)

    ABC127 A1 abc127.xlsx CON1 LOC1 7 ----> OK (UNIQUE DOC_NO + REV_NO COMBINATION + FILE ENTRY)

    ABC127 A1 abc127.pdf CON1 LOC1 3

  • McSQL (12/12/2014)


    CREATE TABLE MYTABLE

    (

    DOC_NO NVARCHAR (10),

    REV_NO NVARCHAR (5),

    FILE_NAME NVARCHAR (20),

    ADD1 NVARCHAR (10),

    ADD2 NVARCHAR (10),

    ADD3 INT

    )

    INSERT INTO MYTABLE (DOC_NO, REV_NO, FILE_NAME, ADD1, ADD2, ADD3)

    SELECT 'ABC123', 'A', 'abc123.pdf', 'CON1', 'LOC1', 1 UNION ALL

    SELECT 'ABC123', 'B', 'abc123_2.docx', 'CON1', 'LOC2', 2 UNION ALL

    SELECT 'ABC124', 'A', 'abc124.xlsx', 'CON2', 'LOC1', 9 UNION ALL

    SELECT 'ABC124', 'A', '-', 'CON2', 'LOC3', 4 UNION ALL

    SELECT 'ABC125', 'A', 'abc125.docx', 'CON1', 'LOC1', 6 UNION ALL

    SELECT 'ABC125', 'C', 'abc125.jpg', 'CON1', 'LOC4', 1 UNION ALL

    SELECT 'ABC125', 'C', 'abc125.docx', 'CON3', 'LOC7', 2 UNION ALL

    SELECT 'ABC125', 'C', '-', 'CON3', 'LOC1', 4 UNION ALL

    SELECT 'ABC126', '0', '-', 'CON7', 'LOC2', 5 UNION ALL

    SELECT 'ABC127', 'A1', 'abc127.xlsx', 'CON1', 'LOC1', 7 UNION ALL

    SELECT 'ABC127', 'A1', 'abc127.pdf', 'CON1', 'LOC1', 3

    And the output:

    DOC_NO REV_NO FILE_NAME ADD1 ADD2 ADD3

    ABC123 A abc123.pdf CON1 LOC1 1 ----> OK (UNIQUE DOC_NO + REV_NO COMBINATION + FILE ENTRY)

    ABC123 B abc123_2.docx CON1 LOC2 ----> OK (UNIQUE DOC_NO + REV_NO COMBINATION + FILE ENTRY)

    ABC124 A abc124.xlsx CON2 LOC1 9 ---> FILTERED OUT (DUPLICATE DOC_NO + REV_NO COMBO)

    ABC124 A - CON2 LOC3 4 ---> FILTERED OUT (DUPLICATE DOC_NO + REV_NO COMBO // ALSO NO FILE)

    ABC125 A abc125.docx CON1 LOC1 6 ---> FILTERED OUT (DUPLICATE DOC_NO + REV_NO COMBO)

    ABC125 C abc125.jpg CON1 LOC4 1 ---> FILTERED OUT (DUPLICATE DOC_NO + REV_NO COMBO)

    ABC125 C abc125.docx CON3 LOC7 2 ---> FILTERED OUT (DUPLICATE DOC_NO + REV_NO COMBO)

    ABC125 C - CON3 LOC1 4 ---> FILTERED OUT (DUPLICATE DOC_NO + REV_NO COMBO // ALSO NO FILE)

    ABC126 0 - CON7 LOC2 5 ---> FILTERED OUT (NO FILE)

    ABC127 A1 abc127.xlsx CON1 LOC1 7 ----> OK (UNIQUE DOC_NO + REV_NO COMBINATION + FILE ENTRY)

    ABC127 A1 abc127.pdf CON1 LOC1 3

    It is not really clear what you want as the output you posted is really not legible.

    Something like this maybe???

    select *

    from MYTABLE t

    join

    (

    select DOC_NO, REV_NO

    from MYTABLE mt

    group by DOC_NO, REV_NO

    having nullif(min(FILE_NAME), '-') > ''

    ) x on x.DOC_NO = t.DOC_NO

    and x.REV_NO = t.REV_NO

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • That looks about right, Sean. I was just about to suggest that the last query in the original post was nearly there - it just needs the IN changed to a JOIN on DOC_NO and REV_NO.

    John

  • Thanks for the query.

    Unfortunately it is filtering out the documents with no file, but isn't filtering out duplicate DOC_NO + REV_NO, so seems to be acting the same as a SELECT * FROM MYTABLE WHERE FILE_NAME NOT LIKE '-'

  • Just to add clarity if there is any confusion as to what i'm struggling to select;

    There are several DOC_NO duplicates

    There are many more REV_NO duplicates

    I want to select all records, where the combination of DOC_NO and REV_NO in a row, only appear once.

    There is a secondary requirement to remove records with no file (represented with a value of '-') but if I can get the first part working, that should be fairly simple to implement.

    Many thanks.

  • McSQL (12/12/2014)


    Just to add clarity if there is any confusion as to what i'm struggling to select;

    There are several DOC_NO duplicates

    There are many more REV_NO duplicates

    I want to select all records, where the combination of DOC_NO and REV_NO in a row, only appear once.

    There is a secondary requirement to remove records with no file (represented with a value of '-') but if I can get the first part working, that should be fairly simple to implement.

    Many thanks.

    Still kind of a guess because the desired output is totally unclear. If you could repost that in a temp table or some format that includes ONLY the rows you want it would be helpful.

    Is this getting you closer?

    select *

    from MYTABLE t

    join

    (

    select DOC_NO, REV_NO

    from MYTABLE mt

    group by DOC_NO, REV_NO

    having nullif(min(FILE_NAME), '-') > ''

    and COUNT(DOC_NO + '_' + REV_NO) = 1

    ) x on x.DOC_NO = t.DOC_NO

    and x.REV_NO = t.REV_NO

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • use COUNT() OVER... to select the ones you want.

    with uniq as

    (

    select DOC_NO, REV_NO, FILE_NAME, ADD1, ADD2, ADD3, count(*) over(partition by DOC_NO, REV_NO) as cnt

    from MYTABLE

    where FILE_NAME <> '-'

    )

    select DOC_NO, REV_NO, FILE_NAME, ADD1, ADD2, ADD3

    from uniq

    where cnt=1;

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • Many thanks, that works perfectly, is exactly what I was looking for.

    Very much appreciated 🙂

  • I tried the second query you added Sean also, and following it through it looks spot on in the logic too.

    It's somehow filtering out 59 extra records ( in a 50,000 record set).

    Not really sure why, as it looks perfect to me.

    I'll take a look at what records make up the difference and will report back if I get to the bottom of it.

    Might just be something else in my data.

  • Viewing 11 posts - 1 through 10 (of 10 total)

    You must be logged in to reply to this topic. Login to reply