How to find a missing sequence in a column?

Question

Post reply

How to find a missing sequence in a column?

Charmer

SSChampion

Points: 14887
More actions
March 20, 2014 at 3:12 am

#283646

Hi Friends,

Create Table Sample (ID int not null primary key, RefID int , SeqNo int , Name varchar(10) ) insert into Sample select 1, 1000, 1, 'Mike' union select 2, 1000, 2, 'Mikey' union select 3, 1000, 3, 'Michel' union select 4, 1001, 1, 'Carmel' union select 5, 1001, 2, 'Camy' union select 6, 1002, 1, 'Joe' union select 7, 1002, 3, 'Jeff' union select 8, 1002, 4, 'Martin'
select * from Sample

I have here sample data given.

What I want to do is, I want to check the RefID which is not having proper order of sequence number. If you see the RefID 1000, 1001 they are having properly sequence order in SeqNo field. But it is not in RefID 1002. RefID 1002 does not have proper order. It is because user has deleted a row which was having seqno 2. So i want to get what are all the RefID's are not having properly sequenced. So that I would be able to know these are all the RefID's are affected by delete statement that was done by user.

Can any one suggest me the query? I tried with self join but it did not work....

Thanks,
Charmer

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 1

There are loads of ways of doing this. Here are two.

drop Table #Sample



Create Table #Sample (ID int not null primary key, RefID int , SeqNo int , Name varchar(10)  )

insert into #Sample

select 1, 1000, 1, 'Mike' union

select 2, 1000, 2, 'Mikey' union

select 3, 1000, 3, 'Michel' union

select 4, 1001, 1, 'Carmel' union

select 5, 1001, 2, 'Camy' union

select 6, 1002, 1, 'Joe' union

select 7, 1002, 3, 'Jeff' union

select 8, 1002, 4, 'Martin' union

select 9, 1002, 5, 'Sean' union

select 10, 1002, 7, 'Philip'

CREATE INDEX ix_Stuff2 ON #Sample (RefID, SeqNo)

;WITH FullSequence AS (



	SELECT RefID, SeqNo = iTally.n

	FROM (

		SELECT RefID, MAX_SeqNo = CAST(MAX(SeqNo) AS BIGINT)

		FROM #Sample

		GROUP BY RefID

	) d

	CROSS APPLY (

		SELECT TOP (d.MAX_SeqNo) n

		FROM (VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) d (n) -- expand to meet your requirements

	) iTally

)

SELECT *

FROM FullSequence fs

EXCEPT

SELECT RefID, SeqNo FROM #Sample

SELECT d.RefID, SeqNo = d.SeqNo-1

FROM #Sample d

CROSS APPLY (

	SELECT TOP 1 si.SeqNo

	FROM #Sample si

	WHERE si.RefID = d.RefID AND si.SeqNo < d.SeqNo

	ORDER BY si.SeqNo DESC

) x

WHERE d.SeqNo > x.SeqNo + 1

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Charmer SSChampion Points: 14887 More actions · Answer 2

Charmer

SSChampion

Points: 14887

March 20, 2014 at 7:02 am

#1698666

Thank you Chris. It worked. First query is much better than second query.

Thanks,
Charmer

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 3

You're welcome, thanks for the feedback.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Hunchback SSCrazy Points: 2649 More actions · Answer 4

You could also use an offset window function like LAG or LEAD.



WITH C1 AS (

select *, LAG(SeqNo, 1, SeqNo) OVER(PARTITION BY RefID ORDER BY SeqNo, ID) AS prv_SeqNo

from [Sample]

)

SELECT *

FROM C1

WHERE (SeqNo - prv_SeqNo) > 1;

An index like (RefID, SeqNo, ID) and any other column of interest in the INCLUDE option could help with the performance.

Dwain Camps SSC Guru Points: 86913 More actions · Answer 5

If all you want is the RefIDs that are missing one or more row numbers, that's really easy:



WITH SampleData (ID, RefID, SeqNo, Name) AS

(

    select 1, 1000, 1, 'Mike' union

    select 2, 1000, 2, 'Mikey' union

    select 3, 1000, 3, 'Michel' union

    select 4, 1001, 1, 'Carmel' union

    select 5, 1001, 2, 'Camy' union

    select 6, 1002, 1, 'Joe' union

    select 7, 1002, 3, 'Jeff' union

    select 8, 1002, 4, 'Martin' union

    select 9, 1002, 5, 'Sean' union

    select 10, 1002, 7, 'Philip'

)

-- Return only RefIDs missing one or more rows

SELECT RefID

FROM

(

    SELECT *, rn=ROW_NUMBER() OVER (PARTITION BY RefID ORDER BY SeqNo)

    FROM SampleData

) a

WHERE rn <> SeqNo

GROUP BY RefID;

If you also want to know the SeqNos that are missing, you can treat it as a Islands to Gaps problem (explained here: The SQL of Gaps and Islands in Sequences [/url]):



WITH SampleData (ID, RefID, SeqNo, Name) AS

(

    select 1, 1000, 1, 'Mike' union

    select 2, 1000, 2, 'Mikey' union

    select 3, 1000, 3, 'Michel' union

    select 4, 1001, 1, 'Carmel' union

    select 5, 1001, 2, 'Camy' union

    select 6, 1002, 1, 'Joe' union

    select 7, 1002, 3, 'Jeff' union

    select 8, 1002, 4, 'Martin' union

    select 9, 1002, 5, 'Sean' union

    select 10, 1002, 7, 'Philip'

)

-- Using islands and gaps

SELECT RefID, SeqNo=MIN(SeqNo)

FROM

(

    -- Find the missing seqence numbers (the gaps) from the islands

    SELECT RefID, MinSeqNo, MaxSeqNo, SeqNo

        ,rn=ROW_NUMBER() OVER (PARTITION BY RefID ORDER BY SeqNo)/2

    FROM

    (

        -- Create a grouping (rn) used to calculate islands

        SELECT RefID, MinSeqNo=MIN(SeqNo), MaxSeqNo=MAX(SeqNo)

        FROM

        (

            SELECT *, rn=SeqNo-ROW_NUMBER() OVER (PARTITION BY RefID ORDER BY SeqNo)

            FROM SampleData

        ) a

        GROUP BY RefID, rn

    ) a

    CROSS APPLY

    (

        VALUES (MinSeqNo-1),(MaxSeqNo+1)

    ) b (SeqNo)

) a

GROUP BY RefID, rn

HAVING COUNT(*) = 2;

I do believe this will be faster than any of the solutions so far and specifically it is faster than using LAG (or LEAD):

The Performance of the T-SQL Window Functions [/url]

Edit: Of course, you could always just ignore the problem by fixing it with a MERGE:

http://qa.sqlservercentral.com/Forums/FindPost1540518.aspx

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Hunchback SSCrazy Points: 2649 More actions · Answer 6

Dwain,

The link to the article is broken. I am interested in reading it to find out why "specifically" it is faster than using LAG / LEAD.

One thing that is handy from using these functions is that the enumeration doesn't have to start from one. It could be any enumeration.



SET NOCOUNT ON;

USE tempdb;

GO

Create Table dbo.Sample (

ID int not null primary key,

RefID int ,

SeqNo int ,

Name varchar(10)

);

GO

insert into dbo.Sample

select 1, 1000, 101, 'Mike'

union

select 2, 1000, 102, 'Mikey'

union

select 3, 1000, 103, 'Michel'

union

select 4, 1001, 11, 'Carmel'

union

select 5, 1001, 12, 'Camy'

union

select 6, 1002, 21, 'Joe'

union

select 7, 1002, 23, 'Jeff'

union

select 8, 1002, 24, 'Martin';

GO

CREATE UNIQUE INDEX idx_uq_Sample_POC ON dbo.Sample (RefID, SeqNo);

GO

WITH C1 AS (

select RefID, SeqNo, LEAD(SeqNo, 1, SeqNo) OVER(PARTITION BY RefID ORDER BY SeqNo) AS nxt_SeqNo

from [Sample]

)

SELECT DISTINCT RefID

FROM C1

WHERE (nxt_SeqNo - SeqNo) > 1;

GO

DROP TABLE dbo.Sample;

GO

As for finding the gaps, here is a simple solution using LEAD.



WITH C1 AS (

select RefID, SeqNo, LEAD(SeqNo, 1, SeqNo) OVER(PARTITION BY RefID ORDER BY SeqNo) AS nxt_SeqNo

from [Sample]

)

SELECT RefID, SeqNo + 1 AS rangestart, nxt_SeqNo - 1 AS rangeend

FROM C1

WHERE (nxt_SeqNo - SeqNo) > 1;

GO

Dwain Camps SSC Guru Points: 86913 More actions · Answer 7

I think the link is fixed now. The article doesn't say why it is faster, just provides an example that shows that it does.

If you've got a gap on the leading or following points, it is simple enough to add placeholders that are less than or greater than whatever your expected end point is.