REPLACE Multiple Spaces with One

  • Nice job, Jeff. Learned a few things myself.

  • dhananjay.n.kulkarni (11/16/2009)


    Hi , Article is nice. I have little very quick one liner solution for this :

    declare @str_having_multiplespaces varchar(200)

    set @str_having_multiplespaces = 'hi onespace 2spaces 3spaces 25spaces'

    select replace(@str_having_multiplespaces , space(2) , space(0))

    --- REsult === >>> hi onespace2spaces 3spaces 25spaces

    Try this and let us know if this faster

    that looked like a better solution but when i tested it

    DECLARE @a CHAR(200)

    SET @a = 'TEST1 TEST2 TEST3 TEST4 TEST5'

    SELECT REPLACE(@A,SPACE(2),SPACE(0))

    results

    TEST1 TEST2 TEST3TEST4 TEST5

    even number of spaces craps things up unfortunately

    --
    Thiago Dantas
    @DantHimself

  • jonathan allen-270981 (11/16/2009)


    Guys,

    Rather than get over concerned about collation for choosing the character to replace the 2nd space you can use any string, so you could use

    SELECT LTRIM(RTRIM(

    REPLACE(REPLACE(REPLACE(OriginalString,' ',' ||*9*9||'),'||*9*9|| ',''),'||*9*9||','')

    ))

    FROM @Demo

    its just a case of picking a delimiter that isnt 'ever' going to be part of the string that you are splitting...

    Jeff: thanks for the post, good to have the example online for reference

    Jonathan

    Jeff stated in the article that he didn't want to increase the size of the data.

    There are some additional requirements. You can't make the data any bigger during the process because it might get too big for VARCHAR(8000) (or whatever size the column is) or you can't use VARCHAR(MAX) because you're using SQL Server 2000.

    Jason

  • Can anyone do some realistic performance compare on this?

    DECLARE @sample VARCHAR(max)

    SET @sample= 'r ggg ggg g gg '

    SELECT LTRIM(RTRIM(

    REPLACE(REPLACE(REPLACE(@sample,' ',' ||*9*9||'),'||*9*9|| ',''),'||*9*9||','')

    ))

    has this kind of result:-

    Time Statistics

    Client processing time averarge 25.4000

    Total execution time average 26.5000

    Wait time on server replies average 1.1000

    And then this code

    DECLARE @sample VARCHAR(max)

    SET @sample= 'r ggg ggg g gg '

    SELECT dbo.fn_CleanUp(@sample)

    has this kind of result:-

    Time Statistics

    Client processing time 2.8750

    Total execution time 4.7500

    Wait time on server replies 1.8750

  • I think it is better not to depend on a single character. So IMO better use set of junk characters as shown here

    http://sqlblogcasts.com/blogs/madhivanan/archive/2007/12/06/squeeze-function.aspx


    Madhivanan

    Failing to plan is Planning to fail

  • Jason,

    I missed that line totally in Jeff's post. sorry to have broken the rules ... 🙁

    Jonathan

  • To solve this same problem in the past, I've created a CLR function or stored procedure to do a regular expression replace. You could also use the ...OA stored procedures to instantiate the VBScript Regular Expression Object do a regular expression replace.

  • I think the results were being affected by one query showing query plan, the other not: now we have :

    DECLARE @sample VARCHAR(max)

    SET @sample= 'r ggg ggg g gg '

    SELECT LTRIM(RTRIM(

    REPLACE(REPLACE(REPLACE(@sample,' ',' ||*9*9||'),'||*9*9|| ',''),'||*9*9||','')

    ))

    Time Statistics Client processing time 1.4000

    Total execution time 2.5000

    Wait time on server replies 1.1000

    And then this code

    DECLARE @sample VARCHAR(max)

    SET @sample= 'r ggg ggg g gg '

    SELECT dbo.fn_CleanUp(@sample)

    has this kind of result:-

    Time Statistics

    Client processing time 2.7000

    Total execution time 3.5000

    Wait time on server replies 0.8000

    [/quote]

  • One other point to consider is sentence endings. Normally there are two spaces at the end of sentences (for readability, etc., and yes, not everyone uses them. So a second character is necessary to "keep double spaces following a period".

  • In order to alleviate (but not eliminate) the problem of choosing a delimiter, the final replace could replace Space+delimiter instead of just the delimiter. So as in the examples(using O for space and X for delimiter)

    Replace(Replace(replace(text,'OO','OX'),'XO',''),'OX','O')

    This way you can use as a delimiter any character that will never follow a space in the dataset which could expand your possibilities.

    --

    JimFive

  • Interesting?

    Here is results using C# CLR

    DECLARE @sample VARCHAR(max)

    SET @sample= 'r ggg ggg g gg '

    SELECT dbo.fn_CleanString_CLR(@sample)

    Client processing time 1.9000

    Total execution time 2.4000

    Wait time on server replies 0.5000

    Here is the c#

    using System;

    using System.Data;

    using System.Data.SqlClient;

    using System.Data.SqlTypes;

    using Microsoft.SqlServer.Server;

    public partial class UserDefinedFunctions

    {

    [SqlFunction()]

    public static string fn_CleanString_CLR(string CleanMeUpScottie)

    {

    // Put your code here

    while (CleanMeUpScottie.IndexOf(" ", 0) > 0)

    {

    CleanMeUpScottie = CleanMeUpScottie.Replace(" ", " ");

    }

    return CleanMeUpScottie;

    }

    };

  • --PREPARE

    CREATE FUNCTION dbo.fn_CleanUp(@FooString VARCHAR(max))

    RETURNS VARCHAR(max)

    BEGIN

    WHILE CHARINDEX(' ',@FooString) > 0

    SELECT @FooString = REPLACE(@FooString,' ',' ')

    RETURN @FooString

    END

    GO

    CREATE TABLE #TEMP1 (COL1 VARCHAR(900))

    CREATE TABLE #TEMP2 (COL2 VARCHAR(900))

    --INSERT 200k ROWS WITH RANDOM SPACES ON EACH TABLE, SEPARATE TABLES ARE USED TO AVOID CACHING, THIS MAY TAKE QUITE AWHILE

    DECLARE @SPACECOUNT1 INT,@SPACECOUNT2 INT,@SPACECOUNT3 INT,@SPACECOUNT4 INT

    SELECT @SPACECOUNT1 = CAST(CAST(NEWID() AS VARBINARY(1)) AS INT),@SPACECOUNT2 = CAST(CAST(NEWID() AS VARBINARY(1)) AS INT)

    INSERT INTO #TEMP1

    OUTPUT inserted.COL1 INTO #TEMP2

    SELECT 'TEST1'+SPACE(@SPACECOUNT1)+'TEST2'+SPACE(@SPACECOUNT2)+'TEST3'

    GO 200000

    --SELECTS

    SELECT LTRIM(RTRIM(

    REPLACE(REPLACE(REPLACE(COL1,' ',' ||*9*9||'),'||*9*9|| ',''),'||*9*9||','')

    ))

    FROM #TEMP1

    GO

    SELECT dbo.fn_CleanUp(COL2)

    FROM #TEMP2

    GO

    --CLEANUP

    DROP FUNCTION dbo.fn_CleanUp

    DROP TABLE #TEMP1

    DROP TABLE #TEMP2

    first select

    Client processing time: 145453

    Total execution time: 145609

    second select

    Client processing time: 80438

    Total execution time: 80734

    personally i thought the first would be faster

    --
    Thiago Dantas
    @DantHimself

  • That is interesting,

    do you have Visual Studio there ....or I can create T-SQL CREATE ASEMBLY line so u can also test the C# CLR version ... ?

    Cheers

  • Excellent solution. I would have written a loop:

    while (charindex(' ', @myString) > 0) begin

    set @myString = replace( @myString, ' ', ' ');

    end;

    but your solution is nicer in that it can be used directly in a select statement with a result set.

  • Thank you Jeff,

    Your solution you presented in the article makes so much sense and is very easy to follow. It will come in handy as I'm going through a massive data cleansing and move right now. Great article!

Viewing 15 posts - 31 through 45 (of 425 total)

You must be logged in to reply to this topic. Login to reply