How to find a missing sequence in a column?

  • Hi Friends,

    Create Table Sample (ID int not null primary key, RefID int , SeqNo int , Name varchar(10) )

    insert into Sample

    select 1, 1000, 1, 'Mike'

    union

    select 2, 1000, 2, 'Mikey'

    union

    select 3, 1000, 3, 'Michel'

    union

    select 4, 1001, 1, 'Carmel'

    union

    select 5, 1001, 2, 'Camy'

    union

    select 6, 1002, 1, 'Joe'

    union

    select 7, 1002, 3, 'Jeff'

    union

    select 8, 1002, 4, 'Martin'

    select * from Sample

    I have here sample data given.

    What I want to do is, I want to check the RefID which is not having proper order of sequence number. If you see the RefID 1000, 1001 they are having properly sequence order in SeqNo field. But it is not in RefID 1002. RefID 1002 does not have proper order. It is because user has deleted a row which was having seqno 2. So i want to get what are all the RefID's are not having properly sequenced. So that I would be able to know these are all the RefID's are affected by delete statement that was done by user.

    Can any one suggest me the query? I tried with self join but it did not work....

    Thanks,
    Charmer

  • There are loads of ways of doing this. Here are two.

    drop Table #Sample

    Create Table #Sample (ID int not null primary key, RefID int , SeqNo int , Name varchar(10) )

    insert into #Sample

    select 1, 1000, 1, 'Mike' union

    select 2, 1000, 2, 'Mikey' union

    select 3, 1000, 3, 'Michel' union

    select 4, 1001, 1, 'Carmel' union

    select 5, 1001, 2, 'Camy' union

    select 6, 1002, 1, 'Joe' union

    select 7, 1002, 3, 'Jeff' union

    select 8, 1002, 4, 'Martin' union

    select 9, 1002, 5, 'Sean' union

    select 10, 1002, 7, 'Philip'

    CREATE INDEX ix_Stuff2 ON #Sample (RefID, SeqNo)

    ;WITH FullSequence AS (

    SELECT RefID, SeqNo = iTally.n

    FROM (

    SELECT RefID, MAX_SeqNo = CAST(MAX(SeqNo) AS BIGINT)

    FROM #Sample

    GROUP BY RefID

    ) d

    CROSS APPLY (

    SELECT TOP (d.MAX_SeqNo) n

    FROM (VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) d (n) -- expand to meet your requirements

    ) iTally

    )

    SELECT *

    FROM FullSequence fs

    EXCEPT

    SELECT RefID, SeqNo FROM #Sample

    SELECT d.RefID, SeqNo = d.SeqNo-1

    FROM #Sample d

    CROSS APPLY (

    SELECT TOP 1 si.SeqNo

    FROM #Sample si

    WHERE si.RefID = d.RefID AND si.SeqNo < d.SeqNo

    ORDER BY si.SeqNo DESC

    ) x

    WHERE d.SeqNo > x.SeqNo + 1

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Thank you Chris. It worked. First query is much better than second query.

    Thanks,
    Charmer

  • You're welcome, thanks for the feedback.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • You could also use an offset window function like LAG or LEAD.

    WITH C1 AS (

    select *, LAG(SeqNo, 1, SeqNo) OVER(PARTITION BY RefID ORDER BY SeqNo, ID) AS prv_SeqNo

    from [Sample]

    )

    SELECT *

    FROM C1

    WHERE (SeqNo - prv_SeqNo) > 1;

    An index like (RefID, SeqNo, ID) and any other column of interest in the INCLUDE option could help with the performance.

  • If all you want is the RefIDs that are missing one or more row numbers, that's really easy:

    WITH SampleData (ID, RefID, SeqNo, Name) AS

    (

    select 1, 1000, 1, 'Mike' union

    select 2, 1000, 2, 'Mikey' union

    select 3, 1000, 3, 'Michel' union

    select 4, 1001, 1, 'Carmel' union

    select 5, 1001, 2, 'Camy' union

    select 6, 1002, 1, 'Joe' union

    select 7, 1002, 3, 'Jeff' union

    select 8, 1002, 4, 'Martin' union

    select 9, 1002, 5, 'Sean' union

    select 10, 1002, 7, 'Philip'

    )

    -- Return only RefIDs missing one or more rows

    SELECT RefID

    FROM

    (

    SELECT *, rn=ROW_NUMBER() OVER (PARTITION BY RefID ORDER BY SeqNo)

    FROM SampleData

    ) a

    WHERE rn <> SeqNo

    GROUP BY RefID;

    If you also want to know the SeqNos that are missing, you can treat it as a Islands to Gaps problem (explained here: The SQL of Gaps and Islands in Sequences [/url]):

    WITH SampleData (ID, RefID, SeqNo, Name) AS

    (

    select 1, 1000, 1, 'Mike' union

    select 2, 1000, 2, 'Mikey' union

    select 3, 1000, 3, 'Michel' union

    select 4, 1001, 1, 'Carmel' union

    select 5, 1001, 2, 'Camy' union

    select 6, 1002, 1, 'Joe' union

    select 7, 1002, 3, 'Jeff' union

    select 8, 1002, 4, 'Martin' union

    select 9, 1002, 5, 'Sean' union

    select 10, 1002, 7, 'Philip'

    )

    -- Using islands and gaps

    SELECT RefID, SeqNo=MIN(SeqNo)

    FROM

    (

    -- Find the missing seqence numbers (the gaps) from the islands

    SELECT RefID, MinSeqNo, MaxSeqNo, SeqNo

    ,rn=ROW_NUMBER() OVER (PARTITION BY RefID ORDER BY SeqNo)/2

    FROM

    (

    -- Create a grouping (rn) used to calculate islands

    SELECT RefID, MinSeqNo=MIN(SeqNo), MaxSeqNo=MAX(SeqNo)

    FROM

    (

    SELECT *, rn=SeqNo-ROW_NUMBER() OVER (PARTITION BY RefID ORDER BY SeqNo)

    FROM SampleData

    ) a

    GROUP BY RefID, rn

    ) a

    CROSS APPLY

    (

    VALUES (MinSeqNo-1),(MaxSeqNo+1)

    ) b (SeqNo)

    ) a

    GROUP BY RefID, rn

    HAVING COUNT(*) = 2;

    I do believe this will be faster than any of the solutions so far and specifically it is faster than using LAG (or LEAD):

    The Performance of the T-SQL Window Functions [/url]

    Edit: Of course, you could always just ignore the problem by fixing it with a MERGE:

    http://qa.sqlservercentral.com/Forums/FindPost1540518.aspx


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Dwain,

    The link to the article is broken. I am interested in reading it to find out why "specifically" it is faster than using LAG / LEAD.

    One thing that is handy from using these functions is that the enumeration doesn't have to start from one. It could be any enumeration.

    SET NOCOUNT ON;

    USE tempdb;

    GO

    Create Table dbo.Sample (

    ID int not null primary key,

    RefID int ,

    SeqNo int ,

    Name varchar(10)

    );

    GO

    insert into dbo.Sample

    select 1, 1000, 101, 'Mike'

    union

    select 2, 1000, 102, 'Mikey'

    union

    select 3, 1000, 103, 'Michel'

    union

    select 4, 1001, 11, 'Carmel'

    union

    select 5, 1001, 12, 'Camy'

    union

    select 6, 1002, 21, 'Joe'

    union

    select 7, 1002, 23, 'Jeff'

    union

    select 8, 1002, 24, 'Martin';

    GO

    CREATE UNIQUE INDEX idx_uq_Sample_POC ON dbo.Sample (RefID, SeqNo);

    GO

    WITH C1 AS (

    select RefID, SeqNo, LEAD(SeqNo, 1, SeqNo) OVER(PARTITION BY RefID ORDER BY SeqNo) AS nxt_SeqNo

    from [Sample]

    )

    SELECT DISTINCT RefID

    FROM C1

    WHERE (nxt_SeqNo - SeqNo) > 1;

    GO

    DROP TABLE dbo.Sample;

    GO

    As for finding the gaps, here is a simple solution using LEAD.

    WITH C1 AS (

    select RefID, SeqNo, LEAD(SeqNo, 1, SeqNo) OVER(PARTITION BY RefID ORDER BY SeqNo) AS nxt_SeqNo

    from [Sample]

    )

    SELECT RefID, SeqNo + 1 AS rangestart, nxt_SeqNo - 1 AS rangeend

    FROM C1

    WHERE (nxt_SeqNo - SeqNo) > 1;

    GO

  • I think the link is fixed now. The article doesn't say why it is faster, just provides an example that shows that it does.

    If you've got a gap on the leading or following points, it is simple enough to add placeholders that are less than or greater than whatever your expected end point is.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Thank you Hunch, The query you have given is here is really nice one.

    Thanks,
    Charmer

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply