Removing non numeric data from right side of number

  • Does anyone now a quick way to remove any non - numeric data from a number (varchar field). 

    Example: 12345XXX = 12345 , 12## = 12, 1233%%%456 = 1233456

    The code needs to be as efficient as possible as it will be performed on a query possibly hitting a million rows.

    Thank you ...

  • this kind of task needs to be handled by a regular expression, there is no other way in my opinion. In SQL 2005 u can write a CLR function that  would match all non-digits in your string

    1233%%%45##6

    and replace them with nothing, resulting in the cleaned output

    1234456

    regex match pattern for this is very simple:

    \D                                     [meaning non-digit]

    it would work blazingly fast even on huge tables.

    u cannot do it in SQL 2000 obviously b/c regex is not supported there.

  • Create scalar UDF taking varchar value as an  argument and returning integer value.

    Then update your table once using this function.

    Don't be afraid to use loop inside of this UDF. Even with loop it will be fast enough, much faster than UPDATE itself. Of course, if you don't reference any table from this UDF.

    And than you include this UDF into trigger validating inserted values for this table and converting them to proper format. 

    _____________
    Code for TallyGenerator

  • Thank you for your replies.  I need to store the data in the original format in my table, any non-numeric characters are masking characters to hide certain data.  So ideally it would be handled by a UDF.  I think I will have to stick with using the REPLACE function for the most common set of non-numeric characters that generally come up.  (** The data is from outside sources that we have no control over **)

  • Create new table and set up trigger to transfer inserted/data from old table to new one. Do select from new table.

    If you store rubbish, you'll get rubbish back. No other way. Sorry.

    _____________
    Code for TallyGenerator

  • This tends to work fairly well:

     

    set

    @vat='1234_xyx'

    print

    substring(@vat,1,patindex('%[^0-9]%',@vat)-1)

     

    Good hunting!

  • Dan,

    tried your

    declare

    @vat varchar(55)

    set

    @vat='1233%%%456'

    print

    substring(@vat,1,patindex('%[^0-9]%',@vat)-1)

    got in return:

    1233

    instead of needed

    1233456

     

  • If you took the code below (or something like it) and made it into a udf it should be close to what you are after.  However, it might not be "lightning fast"!

    DECLARE @str_val VARCHAR(100)

    DECLARE @ret_val VARCHAR(100)

    DECLARE @pos SMALLINT

    DECLARE @len SMALLINT

    SET @str_val = '1233%%%456'

    IF ISNUMERIC(@str_val) = 1

        SET @ret_val = @str_val

    ELSE

    BEGIN

        SET @ret_val = ''

        SET @pos = 1

        SET @len = LEN(@str_val)

        WHILE (@pos <= @len)

        BEGIN

            IF SUBSTRING(@str_val, @pos, 1) LIKE '%[0-9]%'

                SET @ret_val = @ret_val + SUBSTRING(@str_val, @pos, 1)

            SET @pos = @pos + 1

        END

    END

    PRINT '@ret_val => ' + @ret_val

    --------------------------------

    RETURNS: @ret_val => 1233456

  • i still think the best way for this kind of text processing (cleanup) is creating a Regular Expression CLR function and then invoking it on the whole field:

    C# code for the function:

    public striing CleanString()

      { //declare an input string of text

       string InputString = @"1233%%%456XYZYZY";

       //replace everything other than \d with an empty string string ResultString =  Regex.Replace(ResultString, @"[^\d]", "");

    return ResultString; }

    the code looks much less confusing (no Substring of Pathindex nonsense); works very fast ; was able to process 100k recs in < 1 min

  • sorry for misprints:

    C# code for the function:

    public striing CleanString()

      { //declare an input string of text

       string InputString = @"1233%%%456XYZYZY";

       //replace everything other than \d with an empty string 

    string ResultString =  Regex.Replace(InputString, @"[^\d]", "");

    return ResultString;

    //returns: 1233456

     }

  • Thank you.  Unfortunately we are still on 2000 -  Hopefully by the end of the year I can start migrating our application to 2005.

  • Modifying Dan's example a bit, try this:

    DECLARE @vat varchar(55)

    SET @vat='1233%%%456'

    SET @vat = Replace(@vat, SubString(@vat, PatIndex('%[^0-9]%',@vat), 1), '')

    PRINT @vat

  • Great, thats totally what I was looking for.  Thank you!

  • Here's a UDF that may be more reliable, just in case...

    --================================

    -- Create the function

    --================================

    --DROP FUNCTION dbo.fNumbersOnly

    GO

    CREATE FUNCTION dbo.fNumbersOnly

    (

      @STR varchar(50)

    )

    RETURNS varchar(50)

    AS

    BEGIN

      DECLARE @pos int

      SET @pos = PatIndex('%[^0-9]%', @STR)

      WHILE @pos > 0

      BEGIN

        SET @STR = Stuff(@str, @pos, 1, '')  --delete invalid char

        SET @pos = PatIndex('%[^0-9]%', @STR)

      END

      RETURN @STR

    END

    --================================

    -- Test it

    --================================

    PRINT dbo.fNumbersOnly('1234%%%56')

    PRINT dbo.fNumbersOnly('x1234%%%5y6z')

    PRINT dbo.fNumbersOnly('123456')

    PRINT dbo.fNumbersOnly('x123456x')

    PRINT dbo.fNumbersOnly('1x2y3z4a5b6')

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply