Post reply

Find missing numbers in sequential column

Cindy Hallum

SSCommitted

Points: 1808
More actions
December 15, 2005 at 2:23 pm

#111032

I'm pulling data from a legacy system to SQL Server. Let's say I have a table with one column named Manifest and it is populated from this data. In our legacy system, this is a sequential number. So I should have sequential numbers in my table.

I'm having a problem with missing data. To track it down, I want to display the Manifest numbers that are missing in the sequence. If the data in my column is, for example:

423704

423703

423702

423701

423699

423698

423697

423696

423695

Then I want the results of the query to be:

423700

Is this doable? TIA......Cindy
john-132685

SSCertifiable

Points: 6084
More actions
December 15, 2005 at 2:36 pm

#609889

from table a left outer join table b on a.key = b.key -1 where b.key is null

gets you last row BEFORE a break

reverse to get last row AFTER a break in sequence and merge these two to get empty ranges
Ray M

SSC-Insane

Points: 21093
More actions
December 15, 2005 at 2:46 pm

#609891

Create a number table with numbers 1 thru whatever

then perform a left join

Select *

From MyTable

right Join numbers on mytable.pk = number

where mytable.pk is null

and number < (select max(mytable.pk) from mytable)

This will create a number table with 1 millon sequential numbers

CREATE TABLE Numbers(Number INT NOT NULL PRIMARY KEY)

GO

INSERT Numbers

SELECT HundredThousand * 100000 + Tenthousand * 10000 + thousand * 1000 + Hundred * 100 + Ten * 10 + Unit + 1

   FROM (

  SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL

  SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL

  SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL

  SELECT 9) Units (Unit)

CROSS JOIN

        (

  SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL

  SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL

  SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL

  SELECT 9) Tens(Ten)

CROSS JOIN

        (

  SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL

  SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL

  SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL

  SELECT 9) Hundreds(Hundred)

CROSS JOIN

        (

  SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL

  SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL

  SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL

  SELECT 9) thousands(thousand)

CROSS JOIN

        (

  SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL

  SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL

  SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL

  SELECT 9) Tenthousands(Tenthousand)

CROSS JOIN

        (

  SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL

  SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL

  SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL

  SELECT 9) Hundredthousands(Hundredthousand)

GO
Cindy Hallum

SSCommitted

Points: 1808
More actions
December 15, 2005 at 2:53 pm

#609898

Great idea. And thanks for the sql code.
Frank Kalis

SSC Guru

Points: 111183
More actions
December 16, 2005 at 5:44 am

#610020

This will create a number table with 1 millon sequential numbers

CREATE TABLE Numbers(Number INT NOT NULL PRIMARY KEY)

GO

INSERT Numbers

SELECT HundredThousand * 100000 + Tenthousand * 10000 + thousand * 1000 + Hundred * 100 + Ten * 10 + Unit + 1

   FROM (

  SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL

  SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL

  SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL

  SELECT 9) Units (Unit)

CROSS JOIN

        (

  SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL

  SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL

  SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL

  SELECT 9) Tens(Ten)

CROSS JOIN

        (

  SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL

  SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL

  SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL

  SELECT 9) Hundreds(Hundred)

CROSS JOIN

        (

  SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL

  SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL

  SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL

  SELECT 9) thousands(thousand)

CROSS JOIN

        (

  SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL

  SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL

  SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL

  SELECT 9) Tenthousands(Tenthousand)

CROSS JOIN

        (

  SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL

  SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL

  SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL

  SELECT 9) Hundredthousands(Hundredthousand)

GO

Why that complicated?

SELECT TOP 1000000 Num = IDENTITY(INT, 1, 1)

INTO #Numbers

FROM Northwind..sysobjects a1

CROSS JOIN Northwind..sysobjects a2

CROSS JOIN Northwind..sysobjects a3

--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]
jratwork

SSC-Addicted

Points: 439
More actions
December 16, 2005 at 9:02 am

#610090

Just for fun, here's another variation on the first answer:

SELECT

   Key+1

FROM

   MyTable

WHERE

    --The next # is missing

   Key+1 NOT IN (SELECT Key FROM MyTable)

AND

   --We haven't reached the max of values

   Key+1 < (SELECT Max(Key) FROM MyTable)
Jeff Moden

SSC Guru

Points: 1000539
More actions
December 18, 2005 at 11:04 am

#610234
Ok, just for fun, the following code will report ranges of both present and missing ID's. Why do it this way? Because if you have a huge number of missing ID's in a given range (ie. ID's 500,000 through 700,000 are missing), who wants a list of over 200,000 ID's? Another advantage of doing it this way is you don't need to build a huge temporary numbers table if the MaxID in the table being examined is huge (picky ol' DBA's get really twisted when you run TempDB out of space creating a numbers table with a couple of hundred million entries in it ) and it run's almost as fast (nothing runs faster than comparing to a numbers table, though).

To summarize the code below...
1. It creates an experimental table of a million ID's to simulate a table you may have to run the code on (#yourtable) [Takes about 6 seconds to do this on my Desktop PC].
2. It deletes various ranges of records including some individual records.
3. It calculates the "Missing" and "Present" ranges from 1 to the Max ID of the table (1 million, in this case). [Takes about 7 seconds to do this on my Desktop PC]
Lemme know wat'cha think... sorry it'll be so hard to figure out what the code is doing ... my documentation skills aren't what they used to be...

--===== Setup for speed and to prevent blocking

    SET NOCOUNT ON

    SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

--=============================================================================

--      Create an experimental table to simulate the table being examined

--=============================================================================

--===== If the experimental temp table exists, drop it so we can do repeat runs

     -- if we want to "play".

     IF OBJECT_ID('TempDB..#yourtable') IS NOT NULL

        DROP TABLE #yourtable

--===== Create the experimental temp table and populate with IDs on the fly

     -- This always works because SYSCOLUMNS always has at least 256 entries

     -- even in a new database and 256*256*256 > 1,000,000

SELECT TOP 1000000 ID = IDENTITY(INT, 1, 1)

   INTO #yourtable

   FROM dbo.SYSCOLUMNS sc1,

        dbo.SYSCOLUMNS sc2,

        dbo.SYSCOLUMNS sc3

--===== Like any good table, our experimental table needs a Primary Key

ALTER TABLE #yourtable

        ADD PRIMARY KEY CLUSTERED (ID)

--===== Ok, we have an experimental table with a million IDs from 1 to 1,000,000.

     -- Let's remove some of the rows and see if we can find if they're missing

     -- This deletes a handful of individual rows and 3 very small ranges

DELETE #yourtable

WHERE ID IN (1,2,3,250,251,2000,4000,4002,4004,900001,900002,900003)

     -- This deletes a "monster" range just to see how it's handled.

DELETE #yourtable

WHERE ID BETWEEN 500000 AND 700000

--===== At this point, the experiment is ready to rock

--=============================================================================

--      Create a report of all ranges of ID's, big or small, from 1 to the max

--      ID present in the table. Status of each range will be 'Missing' or

--      'Present' and the MinID and MaxID of each range will also be reported.

--      This is done without loops and without a Tally (numbers) table. The

--      reason for doing it without a numbers table is because if the max ID

--      for the table is very large (hundreds of millions), it's not a good

--      idea to put a table that large in TempDB (although some do) and the

--      DBA's frequently deny table creation privs to make a working table in

--      the DB. The temp tables used below should stay fairly small unless

--      someone did something really stupid like deleting every other row.

--      AND, for this million row example, it only takes about 7 seconds to

--      run on my little ol' desktop PC. It should run MUCH faster on a server

--      quality machine.

--=============================================================================

--===== If the working temp tables exist, drop them

     -- Conditionally drop the table that keeps track of "MinIDs" in a range

     IF OBJECT_ID('TempDB..#MyMins') IS NOT NULL

        DROP TABLE #MyMins

     -- Conditionally drop the table that keeps track of "MaxIDs" in a range

     IF OBJECT_ID('TempDB..#MyMaxes') IS NOT NULL

        DROP TABLE #MyMaxes

     -- Conditionally drop the table that holds the final report on ranges of IDs

     IF OBJECT_ID('TempDB..#Report') IS NOT NULL

        DROP TABLE #Report

--===== Find all of the "MinIDs" for ranges that are present and put them in a table

SELECT IDENTITY(INT,1,1) AS RowNum,

        CAST(ID AS INT) AS ID --Cast is necessary because ID is an IDENTITY column

   INTO #MyMins

   FROM #yourtable

WHERE ID NOT IN (SELECT t1.ID

                     FROM #yourtable t1,

                          #yourtable t2

                    WHERE t1.ID=t2.ID+1)

ORDER BY ID --This is necessary so we can match up with the MaxIDs

--===== Find all of the "MaxIDs" for ranges that are present and put them in a table

SELECT IDENTITY(INT,1,1) AS RowNum,

        CAST(ID AS INT) AS ID --Cast is necessary because ID is an IDENTITY column

   INTO #MyMaxes

   FROM #yourtable

WHERE ID NOT IN (SELECT t1.ID

                     FROM #yourtable t1,

                          #yourtable t2

                    WHERE t1.ID+1=t2.ID)

ORDER BY ID --This is necessary so we can match up with the MinIDs

--===== Insert all of the "Present" ranges into the report table using the Min and Max

     -- tables to derive the ranges by matching on the RowNum of each table. This

     -- step creates the Report table on the fly and the RowNum is very necessary for

     -- deriving the missing ID's in the steps that follow.

SELECT IDENTITY(INT,1,1) AS RowNum,

        m.ID AS MinID,

        x.ID AS MaxID,

        'Present' AS Status

   INTO #Report

   FROM #MyMins m,

        #MyMaxes x

WHERE m.RowNum = x.RowNum

--===== Now, derive and insert all of the missing ranges of ID's save, perhaps, one.

INSERT INTO #Report (MinID,MaxID,Status)

SELECT r1.MaxID+1 AS MinID,

        r2.MinID-1 AS MaxID,

        'Missing' AS Status

   FROM #Report r1,

        #Report r2

WHERE r1.RowNum+1 = r2.RowNum

--===== If ID #1 is missing, we wouldn't have caught that so far. So let's check

     -- now and add it's missing range if it's missing

     IF (SELECT MIN(MinID) FROM #Report)>1

         INSERT INTO #Report (MinID,MaxID,Status)

         SELECT 1 AS MinID,

                MIN(MinID)-1 AS MaxID,

                'Missing' AS Status

           FROM #Report

--===== All set... display the report (could be used as a result set, as well)

SELECT MinID,MaxID,Status

   FROM #REPORT

ORDER BY MinID

Now, WOULD SOMEONE GIVE ME AN EXAMPLE AS TO WHY THIS EVER NEEDS TO BE DONE? I could see doing it maybe as a check for transmitted data but not much more. And the old saw about a legacy system never having a row with a sequential column deleted just doesn't sit quite right with me. I agree it's not a good idea to physically delete records but, it happens and no one should care about reusing the ID when there are so many to choose from. So, why would anyone need to do this (find missing ID's)? Surely you don't intend to reuse them

--Jeff Moden
RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions
Jesper-244176

SSCertifiable

Points: 7082
More actions
December 20, 2005 at 2:23 am

#610525

Still just for fun I think the following might be a bit simpler (I use Jeff's code to create #yourtable)...

     IF OBJECT_ID('TempDB..#MyMins') IS NOT NULL

        DROP TABLE #MyMins

     -- Conditionally drop the table that keeps track of "MaxIDs" in a range

     IF OBJECT_ID('TempDB..#MyMaxes') IS NOT NULL

        DROP TABLE #MyMaxes

     create table #MyMins(RowNum int identity(1,1), MinID int)

     create table #MyMaxes(RowNum int identity(1,1), MaxID int)

     insert #MyMins (MinID)

     select t1.ID

     from #yourtable t1 left join #yourtable t2

     on t1.ID = t2.ID + 1

     where t2.ID is null

     order by t1.ID

     insert #MyMaxes (MaxID)

     select t1.ID

     from #yourtable t1 left join #yourtable t2

     on t1.ID = t2.ID - 1

     where t2.ID is null

     order by t1.ID

     select t1.MinID as PresentFrom, t2.MaxID as PresentTo from #MyMins t1 inner join #MyMaxes t2 on

     t1.RowNum = t2.RowNum

     select isnull(t2.MaxID, 0) + 1 as MissingFrom, isnull(t1.MinID, 1000001) - 1 as MissingTo from #MyMins t1 full outer join #MyMaxes t2 on

     t1.RowNum = t2.RowNum + 1

     where t2.MaxID != 1000000
jratwork

SSC-Addicted

Points: 439
More actions
December 20, 2005 at 9:36 am

#610631
This is way too much fun.

I have an unnatural aversion to temp tables and an equally unnatural propensity for clever querying.
```
SELECT
MinRange = (SELECT ISNULL(MAX(subA.TheKey),0)+1 FROM MyValues subA WHERE subA.TheKey < A.TheKey),
MaxRange = TheKey - 1
FROM MyValues A
WHERE
A.TheKey - 1 NOT IN
(SELECT TheKey FROM MyValues)
```
Cindy Hallum

SSCommitted

Points: 1808
More actions
December 20, 2005 at 9:48 am

#610640

You're right...that is very clever. I'm glad you guys are having fun with this. Meanwhile, I used the "create a table of Numbers" approach and am quite pleased with the results.

Thanks for all your help.........cindy
Jeff Moden

SSC Guru

Points: 1000539
More actions
December 20, 2005 at 5:18 pm

#610751

Very cool, jratwork... I have a similar un-natural aversion to correlated subqueries and I had my doubts when I saw the "<" in the correlated subquery... but I just ran this against the million row temp table I made... it ran in less than 3 seconds and correctly found all of the missing ranges. Like I said, Very cool.

There is, however, a fly in the ointment... if ID #1 is present, the first row returned is always

MinRange MaxRange

-------- --------

    1       0

I added "the fix" to the last line of the code...

SELECT MinRange = (SELECT ISNULL(MAX(suba.ID),0)+1

                      FROM #yourtable suba

                     WHERE suba.ID < a.ID),

       MaxRange = ID - 1

FROM #yourtable a

WHERE a.ID - 1 NOT IN (SELECT ID FROM #yourtable)

   AND a.ID - 1 > 0

--Jeff Moden
RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions
Jesper-244176

SSCertifiable

Points: 7082
More actions
December 21, 2005 at 6:00 am

#610805

I had exactly the same doubts when I saw the statement, but I haven't found anything faster... I give up, you beat me... this time...
jratwork

SSC-Addicted

Points: 439
More actions
December 21, 2005 at 8:36 am

#610859

I am humbled by your veteran approval.

I should have caught that one off scenario but I had focused on dealing with a missing id of 1. Thank you.

Until next time.

JR

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply