REPLACE Multiple Spaces with One

Question

Post reply

REPLACE Multiple Spaces with One

Viewing 15 posts - 406 through 420 (of 425 total)

You must be logged in to reply to this topic. Login to reply

Michael Meierruth SSChampion Points: 10051 More actions · Answer 1

Steven Willis (5/26/2013)

I'm late to this party, but here's my entirely TSQL offering to remove duplicate spaces. Most of the code below has been not so shamelessly borrowed from Jeff Moden's work, especially the famous DelimitedSplit8K function which I have found to be one of the most useful tools in the toolbox.

I have no idea how this compares to the other solutions but I thought I'd just throw it into the mix. It's based mostly off of some other variations I've developed from the DelimitedSplit8K function and I think it takes a different approach to the problem. Basically, it splits the string and then puts it back together again. Whether that's better or not I'll let the scorekeeper determine.

OK. Yours is 6344ms. Mine is 2594ms.

Steven Willis SSCrazy Eights Points: 9893 More actions · Answer 2

Michael Meierruth (5/28/2013)

Steven Willis (5/26/2013)

I'm late to this party, but here's my entirely TSQL offering to remove duplicate spaces. Most of the code below has been not so shamelessly borrowed from Jeff Moden's work, especially the famous DelimitedSplit8K function which I have found to be one of the most useful tools in the toolbox.

I have no idea how this compares to the other solutions but I thought I'd just throw it into the mix. It's based mostly off of some other variations I've developed from the DelimitedSplit8K function and I think it takes a different approach to the problem. Basically, it splits the string and then puts it back together again. Whether that's better or not I'll let the scorekeeper determine.

OK. Yours is 6344ms. Mine is 2594ms.

Thanks for taking the time to test this. My initial test using client statistics did show your method to be faster on my 1000 row test table. What testing method did you use? Do you think scalability becomes a factor for any of these methods where one might work better than another for a very large or very wide table? Not disputing the results--just wondering.

Sporidium SSCommitted Points: 1550 More actions · Answer 3

Michael Meierruth (5/28/2013)

Steven Willis (5/26/2013)

I'm late to this party, but here's my entirely TSQL offering to remove duplicate spaces. Most of the code below has been not so shamelessly borrowed from Jeff Moden's work, especially the famous DelimitedSplit8K function which I have found to be one of the most useful tools in the toolbox.

I have no idea how this compares to the other solutions but I thought I'd just throw it into the mix. It's based mostly off of some other variations I've developed from the DelimitedSplit8K function and I think it takes a different approach to the problem. Basically, it splits the string and then puts it back together again. Whether that's better or not I'll let the scorekeeper determine.

OK. Yours is 6344ms. Mine is 2594ms.

DelimitedSplit8K has many and varied uses, though I'm afraid this isn't one that flatters it.

Michael's nested REPLACEs might look cumbersome, but they can't be beat for speed, especially inline.

I tried out this more concise UDF variant of his algorithm using CTEs, but it doesn't quite fly fast enough.



CREATE FUNCTION dbo.fnCoalesceSpaces(@Str varchar(max))

RETURNS varchar(max) AS

BEGIN

    ;WITH A AS (SELECT n FROM (VALUES (32),(16),(8),(4),(2),(1)) A(n))

    SELECT @STR = REPLACE(@Str COLLATE Latin1_General_BIN2, SPACE(n + 1), SPACE(1))

      FROM A

  ORDER BY n DESC

    RETURN @STR

END

Gary

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 4

I believe that the reasons for that are many...

1. It's a scalar UDF.

2. It does calculations instead of using constants.

3. It doesn't take advantage of nested replaces.

4. It has a relatively very expensive ORDER BY.

The nested replaces could actually be made into an Inline Table Valued Function as a kind of "iSF" or Inline Scalar Valued Function. Please see the following for more information on that subject.

http://qa.sqlservercentral.com/articles/T-SQL/91724/

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

TomThomson SSC Guru Points: 104773 More actions · Answer 5

Jeff Moden (3/26/2015)

I believe that the reasons for that are many...

1. It's a scalar UDF.

2. It does calculations instead of using constants.

3. It doesn't take advantage of nested replaces.

4. It has a relatively very expensive ORDER BY.

The nested replaces could actually be made into an Inline Table Valued Function as a kind of "iSF" or Inline Scalar Valued Function. Please see the following for more information on that subject.

http://qa.sqlservercentral.com/articles/T-SQL/91724/

I'm not 100% certain, but I suspect that the ORDER BY is technically meaningless, it can only order the final output rows generated by a query although it may sometimes, if one is lucky, order the internal workings of the query too. So having this UDF produce the right result requires luck - not something I would therefor use in a production environment (because I don't want upgrades - including bugfixes - to the server to risk breaking the code even if I know it currently works - unless I could devise a good reliable method of detecting when things were happening in the wrong order so that I could use a dfferent method in that case, and I can see any efficient method of doing that.

Writing a user defined table valued function with a single select statement nesting the 6 (or however many you choose) replace calls so that the thing runs inline and using explicit string constants instead of calls on SPACE is going to improve performance rather more than somewhat as well as providing formally valid SQL that will work for ever unless someone changes the standard.

Tom

Alan Burstein SSC Guru Points: 61136 More actions · Answer 6

I love this article Jeff and have been using your technique since I discovered this article a little more than a year ago.

I was using your technique to create a function to do this today; out of habit I created an iTVF version (AKA inline scalar version). Out of curiosity I created a scalar version and tested it. As expected the iTVF version was faster (by about 50%). I thought the results were worth sharing.



CREATE FUNCTION NormalizeWhitespace(@string varchar(max))

RETURNS TABLE WITH SCHEMABINDING AS RETURN

SELECT cleanstring =

  CASE

    WHEN charindex('  ',@string) > 0 THEN

    REPLACE(REPLACE(REPLACE(LTRIM(RTRIM(@string)),'  ',' '+CHAR(127)),CHAR(127)+' ',''),CHAR(127),'')

    ELSE @string

  END;

GO

CREATE FUNCTION dbo.NormalizeWhitespace2 (@string varchar(max))

RETURNS varchar(max) WITH SCHEMABINDING AS

BEGIN

  RETURN

  CASE

    WHEN charindex('  ',@string) > 0 THEN

    REPLACE(REPLACE(REPLACE(LTRIM(RTRIM(@string)),'  ',' '+CHAR(127)),CHAR(127)+' ',''),CHAR(127),'')

    ELSE @string

  END;

END

I used varchar(max) because that's what my app needs (keeping it "real world"). Here's the test harness:



SET NOCOUNT ON

IF OBJECT_ID('tempdb..#vals') IS NOT NULL DROP TABLE #vals;

CREATE TABLE #vals (id int identity primary key, val varchar(8000) not null);

INSERT #vals (val)

SELECT TOP 500000 REPLACE(REPLACE(newid(),'-','  '),'A','   ')

FROM sys.all_columns a, sys.all_columns b;

dbcc freeproccache with no_infomsgs

dbcc dropcleanbuffers with no_infomsgs

PRINT '========== itvf version ==========';

DECLARE

  @result varchar(8000),

  @StartTime DATETIME = GETDATE();

  SELECT @result = cleanstring

  FROM #vals

  CROSS APPLY dbo.NormalizeWhitespace(val);

PRINT CAST(@@rowcount AS varchar(10))+' rows.';

PRINT DATEDIFF(ms,@StartTime,GETDATE());

GO 3

PRINT '========== scalar version ==========';

DECLARE

  @result varchar(8000),

  @StartTime DATETIME = GETDATE();

  SELECT @result = dbo.NormalizeWhitespace2(val)

  FROM #vals;

PRINT CAST(@@rowcount AS varchar(10))+' rows.';

PRINT DATEDIFF(ms,@StartTime,GETDATE());

GO 3

And the results:



Beginning execution loop

========== itvf version ==========

500000 rows.

3496

========== itvf version ==========

500000 rows.

3210

========== itvf version ==========

500000 rows.

3256

Batch execution completed 3 times.

Beginning execution loop

========== scalar version ==========

500000 rows.

4813

========== scalar version ==========

500000 rows.

4780

========== scalar version ==========

500000 rows.

4810

Batch execution completed 3 times.

Note: COLLATE Latin1_General_BIN made no difference.

Please forgive me if someone has already done this. Cheers.

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 7

Alan.B (5/26/2015)

I love this article Jeff and have been using your technique since I discovered this article a little more than a year ago.

Thanks, Alan, but you need to revisit the article. Read the Prologue and see the first link there, which leads to Michael's method of using nested replaces just for the spaces. It blows the door off my method and virtually all comers. That's what I like about this community... lot's of people joined in the discussion and the result was a truly nasty fast method that passed my on the highway so fast that it looked like my method was standing still.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Alan Burstein SSC Guru Points: 61136 More actions · Answer 8

Jeff Moden (5/26/2015)

Alan.B (5/26/2015)

I love this article Jeff and have been using your technique since I discovered this article a little more than a year ago.

Thanks, Alan, but you need to revisit the article. Read the Prologue and see the first link there, which leads to Michael's method of using nested replaces just for the spaces. It blows the door off my method and virtually all comers. That's what I like about this community... lot's of people joined in the discussion and the result was a truly nasty fast method that passed my on the highway so fast that it looked like my method was standing still.

I should start reading prologues I guess 😉

Nonetheless, I tried the same exercise with Michael's function and the iTVF was about 25% faster still. Long live the inline scalar iTVF!

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 9

Alan.B (5/26/2015)

Jeff Moden (5/26/2015)

Alan.B (5/26/2015)

I love this article Jeff and have been using your technique since I discovered this article a little more than a year ago.

Thanks, Alan, but you need to revisit the article. Read the Prologue and see the first link there, which leads to Michael's method of using nested replaces just for the spaces. It blows the door off my method and virtually all comers. That's what I like about this community... lot's of people joined in the discussion and the result was a truly nasty fast method that passed my on the highway so fast that it looked like my method was standing still.

I should start reading prologues I guess 😉

Nonetheless, I tried the same exercise with Michael's function and the iTVF was about 25% faster still. Long live the inline scalar iTVF!

Can't go wrong there. Guess I'm going to have to revisit this article. Heh... maybe add a pre-prologue. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Michael Meierruth SSChampion Points: 10051 More actions · Answer 10

Michael Meierruth

SSChampion

Points: 10051

May 27, 2015 at 3:27 am

#1800792

Alan B.

How does your function stand up to a varchar(8000) column with an * at the beginning and end and all blanks in between?

Alan Burstein SSC Guru Points: 61136 More actions · Answer 11

Michael Meierruth (5/27/2015)

Alan B.

How does your function stand up to a varchar(8000) column with an * at the beginning and end and all blanks in between?

Okay, first - both functions are much slower with a * at the beginning/end. My first test was at 500K rows and I cancelled before the first test finished. I did 10K, 50K and 100K rows. Here's DDL, Test and results:

-- Ran on my Laptop -- X64 (8 Logical CPU, 2.8ghz, 32GB) Microsoft SQL Server 2014 - 12.0.2254.0 (X64)



USE tempdb

GO

-- CREATE THE FUNCTIONS

-----------------------------------------------------------------

IF OBJECT_ID('dbo.fn_CleanUp_MichaelMeierruth_MKIV') IS NOT NULL

DROP FUNCTION dbo.fn_CleanUp_MichaelMeierruth_MKIV

GO

CREATE FUNCTION dbo.fn_CleanUp_MichaelMeierruth_MKIV(@S VARCHAR(max))

RETURNS VARCHAR(max) WITH SCHEMABINDING AS

BEGIN

RETURN   replace(replace(replace(replace(replace(replace(replace(ltrim(rtrim(replace(@s,CHAR(9),' '))),

  '                                 ',' '),

  '                 ',' '),

  '         ',' '),

  '     ',' '),

  '   ',' '),

  '  ',' '),

  '  ',' ')

END;

GO

IF OBJECT_ID('dbo.NormalizeWhiteSpace8K') IS NOT NULL

DROP FUNCTION dbo.NormalizeWhiteSpace8K

GO

CREATE FUNCTION dbo.NormalizeWhiteSpace8K(@String varchar(max))

RETURNS TABLE WITH SCHEMABINDING AS

RETURN

SELECT

NewString =

replace(replace(replace(replace(replace(replace(replace(

ltrim(rtrim(@String)),

  '                                 ',' '),

  '                 ',' '),

  '         ',' '),

  '     ',' '),

  '   ',' '),

  '  ',' '),

  '  ',' ');

GO

-- MAKE SURE BOTH STILL WORK (yep)

-----------------------------------------------------------------

DECLARE @string varchar(8000);

SET @string = '*'+REPLICATE(' ',7998)+'*';

SELECT * FROM dbo.NormalizeWhiteSpace8K(@string);

SELECT dbo.fn_CleanUp_MichaelMeierruth_MKIV(@string);

--THE TESTS

-----------------------------------------------------------------

SET NOCOUNT ON

IF OBJECT_ID('tempdb..#vals') IS NOT NULL DROP TABLE #vals;

CREATE TABLE #vals (id int identity primary key, val varchar(8000) not null);

INSERT #vals (val)

SELECT TOP 100000 '*'+REPLICATE(' ',7998)+'*'

FROM sys.all_columns a, sys.all_columns b;

dbcc freeproccache with no_infomsgs

dbcc dropcleanbuffers with no_infomsgs

PRINT '========== itvf version ==========';

DECLARE

  @result varchar(8000),

  @StartTime DATETIME = GETDATE();

  SELECT @result = NewString

  FROM #vals

  CROSS APPLY dbo.NormalizeWhiteSpace8K(val);

PRINT CAST(@@rowcount AS varchar(10))+' rows.';

PRINT DATEDIFF(ms,@StartTime,GETDATE());

GO 3

PRINT '========== scalar version ==========';

DECLARE

  @result varchar(8000),

  @StartTime DATETIME = GETDATE();

  SELECT @result = dbo.fn_CleanUp_MichaelMeierruth_MKIV(val)

  FROM #vals;

PRINT CAST(@@rowcount AS varchar(10))+' rows.';

PRINT DATEDIFF(ms,@StartTime,GETDATE());

GO 3

/*

--10,000 Records

Beginning execution loop

========== itvf version ==========

10000 rows.

1450

========== itvf version ==========

10000 rows.

1380

========== itvf version ==========

10000 rows.

1413

Batch execution completed 3 times.

Beginning execution loop

========== scalar version ==========

10000 rows.

2300

========== scalar version ==========

10000 rows.

2290

========== scalar version ==========

10000 rows.

2276

Batch execution completed 3 times.

*/

/*

-- 50,000 rows

Beginning execution loop

========== itvf version ==========

50000 rows.

7166

========== itvf version ==========

50000 rows.

7116

========== itvf version ==========

50000 rows.

7070

Batch execution completed 3 times.

Beginning execution loop

========== scalar version ==========

50000 rows.

11503

========== scalar version ==========

50000 rows.

11530

========== scalar version ==========

50000 rows.

11493

Batch execution completed 3 times.

*/

/*

-- 100,000 rows

Beginning execution loop

========== itvf version ==========

100000 rows.

14426

========== itvf version ==========

100000 rows.

14226

========== itvf version ==========

100000 rows.

14226

Batch execution completed 3 times.

Beginning execution loop

========== scalar version ==========

100000 rows.

22900

========== scalar version ==========

100000 rows.

23076

========== scalar version ==========

100000 rows.

22943

Batch execution completed 3 times.

*/

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

Sporidium SSCommitted Points: 1550 More actions · Answer 12

Alan.B (5/27/2015)

Michael Meierruth (5/27/2015)

Alan B.

How does your function stand up to a varchar(8000) column with an * at the beginning and end and all blanks in between?

Okay, first - both functions are much slower with a * at the beginning/end. My first test was at 500K rows and I cancelled before the first test finished. I did 10K, 50K and 100K rows. Here's DDL, Test and results:

-- Ran on my Laptop -- X64 (8 Logical CPU, 2.8ghz, 32GB) Microsoft SQL Server 2014 - 12.0.2254.0 (X64)
USE tempdb GO -- CREATE THE FUNCTIONS ----------------------------------------------------------------- IF OBJECT_ID('dbo.fn_CleanUp_MichaelMeierruth_MKIV') IS NOT NULL DROP FUNCTION dbo.fn_CleanUp_MichaelMeierruth_MKIV GO CREATE FUNCTION dbo.fn_CleanUp_MichaelMeierruth_MKIV(@S VARCHAR(max)) RETURNS VARCHAR(max) WITH SCHEMABINDING AS BEGIN RETURN replace(replace(replace(replace(replace(replace(replace(ltrim(rtrim(replace(@s,CHAR(9),' '))), ' ',' '), ' ',' '), ' ',' '), ' ',' '), ' ',' '), ' ',' '), ' ',' ') END; GO IF OBJECT_ID('dbo.NormalizeWhiteSpace8K') IS NOT NULL DROP FUNCTION dbo.NormalizeWhiteSpace8K GO CREATE FUNCTION dbo.NormalizeWhiteSpace8K(@String varchar(max)) RETURNS TABLE WITH SCHEMABINDING AS RETURN SELECT NewString = replace(replace(replace(replace(replace(replace(replace( ltrim(rtrim(@String)), ' ',' '), ' ',' '), ' ',' '), ' ',' '), ' ',' '), ' ',' '), ' ',' '); GO -- MAKE SURE BOTH STILL WORK (yep) ----------------------------------------------------------------- DECLARE @string varchar(8000); SET @string = '*'+REPLICATE(' ',7998)+'*'; SELECT * FROM dbo.NormalizeWhiteSpace8K(@string); SELECT dbo.fn_CleanUp_MichaelMeierruth_MKIV(@string); --THE TESTS ----------------------------------------------------------------- SET NOCOUNT ON IF OBJECT_ID('tempdb..#vals') IS NOT NULL DROP TABLE #vals; CREATE TABLE #vals (id int identity primary key, val varchar(8000) not null); INSERT #vals (val) SELECT TOP 100000 '*'+REPLICATE(' ',7998)+'*' FROM sys.all_columns a, sys.all_columns b; dbcc freeproccache with no_infomsgs dbcc dropcleanbuffers with no_infomsgs PRINT '========== itvf version =========='; DECLARE @result varchar(8000), @StartTime DATETIME = GETDATE(); SELECT @result = NewString FROM #vals CROSS APPLY dbo.NormalizeWhiteSpace8K(val); PRINT CAST(@@rowcount AS varchar(10))+' rows.'; PRINT DATEDIFF(ms,@StartTime,GETDATE()); GO 3 PRINT '========== scalar version =========='; DECLARE @result varchar(8000), @StartTime DATETIME = GETDATE(); SELECT @result = dbo.fn_CleanUp_MichaelMeierruth_MKIV(val) FROM #vals; PRINT CAST(@@rowcount AS varchar(10))+' rows.'; PRINT DATEDIFF(ms,@StartTime,GETDATE()); GO 3 /* --10,000 Records Beginning execution loop ========== itvf version ========== 10000 rows. 1450 ========== itvf version ========== 10000 rows. 1380 ========== itvf version ========== 10000 rows. 1413 Batch execution completed 3 times. Beginning execution loop ========== scalar version ========== 10000 rows. 2300 ========== scalar version ========== 10000 rows. 2290 ========== scalar version ========== 10000 rows. 2276 Batch execution completed 3 times. */ /* -- 50,000 rows Beginning execution loop ========== itvf version ========== 50000 rows. 7166 ========== itvf version ========== 50000 rows. 7116 ========== itvf version ========== 50000 rows. 7070 Batch execution completed 3 times. Beginning execution loop ========== scalar version ========== 50000 rows. 11503 ========== scalar version ========== 50000 rows. 11530 ========== scalar version ========== 50000 rows. 11493 Batch execution completed 3 times. */ /* -- 100,000 rows Beginning execution loop ========== itvf version ========== 100000 rows. 14426 ========== itvf version ========== 100000 rows. 14226 ========== itvf version ========== 100000 rows. 14226 Batch execution completed 3 times. Beginning execution loop ========== scalar version ========== 100000 rows. 22900 ========== scalar version ========== 100000 rows. 23076 ========== scalar version ========== 100000 rows. 22943 Batch execution completed 3 times.
*/

Surely both of these methods would benefit from using a binary collation, allowing the REPLACE function to execute faster?

For example, adding COLLATE Latin1_General_BIN2 to the function's inner @string parameter reference gives a significant speed improvement in my tests.

Maybe there's a downside I haven't thought of, but I'd have thought a space is a space is a space in any collation.

Alan Burstein SSC Guru Points: 61136 More actions · Answer 13

Gary Harding (5/28/2015)

Alan.B (5/27/2015)

Michael Meierruth (5/27/2015)

Alan B.

How does your function stand up to a varchar(8000) column with an * at the beginning and end and all blanks in between?

Okay, first - both functions are much slower with a * at the beginning/end. My first test was at 500K rows and I cancelled before the first test finished. I did 10K, 50K and 100K rows. Here's DDL, Test and results:

-- Ran on my Laptop -- X64 (8 Logical CPU, 2.8ghz, 32GB) Microsoft SQL Server 2014 - 12.0.2254.0 (X64)
USE tempdb GO -- CREATE THE FUNCTIONS ----------------------------------------------------------------- IF OBJECT_ID('dbo.fn_CleanUp_MichaelMeierruth_MKIV') IS NOT NULL DROP FUNCTION dbo.fn_CleanUp_MichaelMeierruth_MKIV GO CREATE FUNCTION dbo.fn_CleanUp_MichaelMeierruth_MKIV(@S VARCHAR(max)) RETURNS VARCHAR(max) WITH SCHEMABINDING AS BEGIN RETURN replace(replace(replace(replace(replace(replace(replace(ltrim(rtrim(replace(@s,CHAR(9),' '))), ' ',' '), ' ',' '), ' ',' '), ' ',' '), ' ',' '), ' ',' '), ' ',' ') END; GO IF OBJECT_ID('dbo.NormalizeWhiteSpace8K') IS NOT NULL DROP FUNCTION dbo.NormalizeWhiteSpace8K GO CREATE FUNCTION dbo.NormalizeWhiteSpace8K(@String varchar(max)) RETURNS TABLE WITH SCHEMABINDING AS RETURN SELECT NewString = replace(replace(replace(replace(replace(replace(replace( ltrim(rtrim(@String)), ' ',' '), ' ',' '), ' ',' '), ' ',' '), ' ',' '), ' ',' '), ' ',' '); GO -- MAKE SURE BOTH STILL WORK (yep) ----------------------------------------------------------------- DECLARE @string varchar(8000); SET @string = '*'+REPLICATE(' ',7998)+'*'; SELECT * FROM dbo.NormalizeWhiteSpace8K(@string); SELECT dbo.fn_CleanUp_MichaelMeierruth_MKIV(@string); --THE TESTS ----------------------------------------------------------------- SET NOCOUNT ON IF OBJECT_ID('tempdb..#vals') IS NOT NULL DROP TABLE #vals; CREATE TABLE #vals (id int identity primary key, val varchar(8000) not null); INSERT #vals (val) SELECT TOP 100000 '*'+REPLICATE(' ',7998)+'*' FROM sys.all_columns a, sys.all_columns b; dbcc freeproccache with no_infomsgs dbcc dropcleanbuffers with no_infomsgs PRINT '========== itvf version =========='; DECLARE @result varchar(8000), @StartTime DATETIME = GETDATE(); SELECT @result = NewString FROM #vals CROSS APPLY dbo.NormalizeWhiteSpace8K(val); PRINT CAST(@@rowcount AS varchar(10))+' rows.'; PRINT DATEDIFF(ms,@StartTime,GETDATE()); GO 3 PRINT '========== scalar version =========='; DECLARE @result varchar(8000), @StartTime DATETIME = GETDATE(); SELECT @result = dbo.fn_CleanUp_MichaelMeierruth_MKIV(val) FROM #vals; PRINT CAST(@@rowcount AS varchar(10))+' rows.'; PRINT DATEDIFF(ms,@StartTime,GETDATE()); GO 3 /* --10,000 Records Beginning execution loop ========== itvf version ========== 10000 rows. 1450 ========== itvf version ========== 10000 rows. 1380 ========== itvf version ========== 10000 rows. 1413 Batch execution completed 3 times. Beginning execution loop ========== scalar version ========== 10000 rows. 2300 ========== scalar version ========== 10000 rows. 2290 ========== scalar version ========== 10000 rows. 2276 Batch execution completed 3 times. */ /* -- 50,000 rows Beginning execution loop ========== itvf version ========== 50000 rows. 7166 ========== itvf version ========== 50000 rows. 7116 ========== itvf version ========== 50000 rows. 7070 Batch execution completed 3 times. Beginning execution loop ========== scalar version ========== 50000 rows. 11503 ========== scalar version ========== 50000 rows. 11530 ========== scalar version ========== 50000 rows. 11493 Batch execution completed 3 times. */ /* -- 100,000 rows Beginning execution loop ========== itvf version ========== 100000 rows. 14426 ========== itvf version ========== 100000 rows. 14226 ========== itvf version ========== 100000 rows. 14226 Batch execution completed 3 times. Beginning execution loop ========== scalar version ========== 100000 rows. 22900 ========== scalar version ========== 100000 rows. 23076 ========== scalar version ========== 100000 rows. 22943 Batch execution completed 3 times.
*/

Surely both of these methods would benefit from using a binary collation, allowing the REPLACE function to execute faster?

For example, adding COLLATE Latin1_General_BIN2 to the function's inner @string parameter reference gives a significant speed improvement in my tests.

Maybe there's a downside I haven't thought of, but I'd have thought a space is a space is a space in any collation.

I have to try Latin1_General_BIN2. I tested with Latin1_General_BIN and saw no improvement.

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

jessinthomas012 SSC Rookie Points: 46 More actions · Answer 14

jessinthomas012

SSC Rookie

Points: 46

June 15, 2018 at 12:38 am

#1994081

Hi Jeff,

Well explained.. Thanks for sharing this :)..

Thanks,
Jessin

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 15

Thanks for the feedback, Jessin.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions