Apparent Error with BETWEEN operator

  • We use the BETWEEN operator to find data in character ranges. We just came across a situation where BETWEEN is returning multiple records and I confess that I'm clueless as to the cause.

    The example:

    CREATE TABLE [dbo].[test] (

    [col_1] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    [col_2] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL

    ) ON [PRIMARY]

    GO

    Data:

    col 1 Col 2

    MSD1558001 MSD1560000

    MSD154001 MSD156000

    MSD14001 MSD16000

    select * from test where 'MSD1558017' between col_1 and col_2

    By inspection, this should return the first data set. In reality (our dev system and production system 🙂 ) it returns all three. I have also tried this using >= and the same thing happens.

    The only partial clue would be something collation related, but I am truely at a loss. Any ideas would be very much appreciated!

    Thx --Jim

  • Are you wanting to compare them as strings or are you trying to find whether the numerical part of MSD1558017 falls between the numerical parts of col_1 and col_2?

    I'm a bit rusty with my string comparisons but I would expect the statement to return all three rows as the first four characters match and the fifth character falls between the fifth characters of each row (if this makes any sense)! 🙂

  • with char datatypes between (and > or < for that matter) do a character by character comparison not looking at the whole field as it would with numbers. If the second character of what you are comparing is between the second characters of the two items you are comparing to and the first character is the same it will return true regardless of the rest of the characters in the string.


  • That could be the answer! In the real world, these numbers are barcoded on envelopes and we send and receive a couple hundred thousand a month. In light of the above post, I'm going to test by padding the numeric values left with zero's to maintain strings of equal length. Thanks for the insite!

  • I'm not a 100% on this, but I believe the BETWEEN function does it's comparisons at the binary level. At least that's my best guess.

    thus......(extra zeros removed)

    MSD154001_________converted to binary would be______0x4D534431353430303100........

    MSD156000_________converted to binary would be______0x4D534431353630303000........

    and

    MSD1558017________converted to binary would be______0x4D534431353538303137........

    which IS between the previous values.

    SELECT

    CAST('MSD154001' AS BINARY) AS 'MSD154001'

    ,CAST('MSD1558017' AS BINARY) AS 'MSD1558017'

    ,CAST('MSD156000' AS BINARY) AS 'MSD156000'

    ______________________________________________________________________

    Personal Motto: Why push the envelope when you can just open it?

    If you follow the direction given HERE[/url] you'll likely increase the number and quality of responses you get to your question.

    Jason L. Selburg
  • This solution assumes a big table lots of new data coming in. It also assumes that all have got the MSD prefix which I could be wrong about. if it is always three Alpha then use STUFF. If the prefix could be anything, then extract the integer by finding the transition between alpha and number with PATINDEX. You might need to play with the indexing strategy but I suspect that a covering compound index including both col1asinteger and col2AsInteger would do the trick.

    [font="Courier New"]CREATE TABLE [dbo].[#test] (

    [col_1] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    [col_2] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    Col1AsInteger INT,

    Col2AsInteger INT

    ) ON [PRIMARY]

    GO

    INSERT INTO #test(col_1,col_2) SELECT 'MSD1558001', 'MSD1560000'

    INSERT INTO #test(col_1,col_2) SELECT 'MSD154001', 'MSD156000'

    INSERT INTO #test(col_1,col_2) SELECT 'MSD14001', 'MSD16000'

    --and so on, several thousand of 'em

    --now update all new entries in the table

    UPDATE #test SET Col1AsInteger=REPLACE(col_1,'MSD',''),

           Col2AsInteger=REPLACE(col_2,'MSD','')

    WHERE Col1AsInteger+Col2AsInteger IS NULL--where either are null

    SELECT col_1,col_2 FROM #test

    WHERE REPLACE('MSD1558017','MSD','')

                BETWEEN Col1AsInteger AND Col2AsInteger

    [/font]

    Best wishes,
    Phil Factor

  • Jim Howell (10/18/2007)


    That could be the answer! In the real world, these numbers are barcoded on envelopes and we send and receive a couple hundred thousand a month. In light of the above post, I'm going to test by padding the numeric values left with zero's to maintain strings of equal length. Thanks for the insite!

    Slight touch of normalisation always gives a positive effect on databases.

    If you'd have Code and Number separated and stored in columns with appropriate datatype you'd never have such problem.

    Replacing varchar Code with smallint CodeID would improve performance of you queries, both saving and retrieving data. Not to mention dramatic decrease in total size of the tables.

    _____________
    Code for TallyGenerator

  • Phil--

    Good assumptions! This is a solution that will work for us. We can't always control the formatting of the barcodes because some come from outside vendors. This is a very elegant solution that I will apply to our development server. Much appreciated.

    Thx --Jim

  • [font="Courier New"]-- Create 2 tables to hold normalized data

    IF Object_ID('dbo.PrefixCode', 'U') IS NOT NULL

    DROP TABLE dbo.PrefixCode

    CREATE TABLE dbo.PrefixCode (

    ID smallint IDENTITY(-32767, 1) NOT NULL PRIMARY KEY NONCLUSTERED,

    Code nvarchar(10) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL

    )

    CREATE UNIQUE CLUSTERED INDEX UX_PrefixCode_Code ON dbo.PrefixCode (Code)

    IF Object_ID('dbo.BarCodeRange', 'U') IS NOT NULL

    DROP TABLE dbo.BarCodeRange

    CREATE TABLE dbo.BarCodeRange (

    PrefixCodeID smallint NOT NULL,

    StartNumber int NOT NULL,

    EndNumber int NOT NULL

    )

    CREATE INDEX UX_BarCodeRange_EndNumber ON dbo.BarCodeRange (EndNumber)

    GO

    -- Create view to represent data in denormalized form

    IF Object_ID('dbo.BarCodeRange_Strings', 'V') IS NOT NULL

    DROP VIEW dbo.BarCodeRange_Strings

    GO

    CREATE VIEW dbo.BarCodeRange_Strings

    AS

    SELECT C.Code, R.StartNumber, R.EndNumber,

    C.Code + convert(nvarchar(9), R.StartNumber) as Col_1,

    C.Code + convert(nvarchar(9), R.EndNumber) as Col_2

    FROM dbo.PrefixCode C

    INNER JOIN dbo.BarCodeRange R ON C.ID = R.PrefixCodeID

    Go

    -- Create a stored procedure to populate tables from denormalized input

    CREATE TRIGGER dbo.BarCodeRange_Strings_INSERT

    ON dbo.BarCodeRange_Strings

    INSTEAD OF INSERT

    AS

    DECLARE @Temp TABLE (

    NewCode nvarchar(10) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

    NewStartNumber nvarchar(50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

    NewEndNumber nvarchar(50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL

    )

    INSERT INTO @Temp (NewCode, NewStartNumber, NewEndNumber)

    SELECT SUBSTRING(Col_1, 1, PATINDEX('%[0-9]%', Col_1) -1) as NewCode,

    Col_1 as NewStartNumber, Col_2 AS NewEndNumber

    FROM inserted

    UNION -- we need distinct values

    SELECT SUBSTRING(Col_2, 1, PATINDEX('%[0-9]%', Col_2) -1) as NewCode, Col_1, Col_2

    FROM inserted

    INSERT INTO dbo.PrefixCode

    (Code)

    SELECT N.NewCode

    FROM @Temp N

    WHERE NOT EXISTS (select 1 from dbo.PrefixCode C where N.NewCode = C.Code)

    GROUP BY N.NewCode

    INSERT INTO dbo.BarCodeRange

    (PrefixCodeID, StartNumber, EndNumber)

    SELECT C.ID, REPLACE(N.NewStartNumber, C.Code, ''), REPLACE(N.NewEndNumber, C.Code, '')

    FROM dbo.PrefixCode C

    INNER JOIN @Temp N ON N.NewCode = C.Code

    WHERE NOT EXISTS (select 1 from dbo.BarCodeRange R

    where R.PrefixCodeID = C.ID AND R.StartNumber = REPLACE(N.NewStartNumber, C.Code, '')

    )

    GO

    -- Insert test data

    INSERT INTO dbo.BarCodeRange_Strings

    (Code, StartNumber, EndNumber, Col_1, Col_2)

    SELECT 0, '', '', 'MSD1558001', 'MSD1560000'

    UNION

    SELECT 0, '', '', 'MSD154001', 'MSD156000'

    UNION

    SELECT 0, '', '', 'MSD14001', 'MSD16000'

    UNION

    SELECT 0, '', '', 'ASD154001', 'ASD156000'

    UNION

    SELECT 0, '', '', 'ASD14001', 'ASD16000'

    -- Check result

    SELECT * FROM dbo.BarCodeRange_Strings

    -- Run simple test

    DECLARE @TestCode nvarchar(50)

    SET @TestCode = 'MSD1558017'

    SELECT * FROM dbo.BarCodeRange_Strings

    WHERE Code = SUBSTRING(@TestCode, 1, PATINDEX('%[0-9]%', @TestCode) -1)

    AND REPLACE(@TestCode, Code, '') >= StartNumber

    AND REPLACE(@TestCode, Code, '') <= EndNumber

    [/font]

    This will work with any given codes, even in foreign languages, no need to hardcode known codes

    _____________
    Code for TallyGenerator

  • Sergiy--

    Thank you for this solution. I will try this in our dev system today. Thanks all for your helpfulness on this issue. This is a very worthwhile forum!

    --Jim

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply