Reaping the benefits of the Window functions in T-SQL

Question

Post reply

Reaping the benefits of the Window functions in T-SQL

Viewing 15 posts - 16 through 30 (of 57 total)

You must be logged in to reply to this topic. Login to reply

Jason-368451 SSC Enthusiast Points: 127 More actions · Answer 1

Thanks for the great info.

I was previously using a modified version of

Jeff Moden's Tally OH! An Improved SQL 8K “CSV Splitter” Function

to parse text files that where pipe delimited.

I really needed a way to parse proper csv files that are comma delimited with quote text qualifiers

as the data may have embedded commas.

i.e. "900 N. May ST., #5"

using the previous splitter it would get split into "900 N. May ST. and #5"

After searching the forums I found this and it works perfect but..

The production server for this project is 2008 R2.

Is there any way to replicate the Lag() and Lead() functions with 2008 equivalents?

Any help would be greatly appreciated.

m.t.cleary SSC Veteran Points: 200 More actions · Answer 2

m.t.cleary

SSC Veteran

Points: 200

October 7, 2014 at 10:22 pm

#1750820

Even li'l ol' MySQL has GROUP_CONCAT().

You mean only MySQL has GROUP_CONCAT(). BOL has a SQLCLR version http://msdn.microsoft.com/en-us/library/ms131056(v=sql.105).aspx and there is an improved versoin in codeplex http://groupconcat.codeplex.com/%5B/url%5D.

Alan Burstein SSC Guru Points: 61136 More actions · Answer 3

Hope it's not too late to say, "Great article Eirikur!". I just finished reading this for a second time (there is a problem that I was struggling with that your article helped me solve). I have referred many people to this article as a good example of "How to reap the benefits of Window functions".

I had one small question... I noticed you used the Latin1_General_BIN collation trick (WHERE SUBSTRING(@pString,t.N,1) COLLATE Latin1_General_BIN = @pDelimiter) in you CSV example but did not use it in dbo_DelimitedSplit8K_LEAD. Is there a specific reason that you did not use it there?

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

Eirikur Eiriksson SSC Guru Points: 182631 More actions · Answer 4

Alan.B (9/1/2015)

Hope it's not too late to say, "Great article Eirikur!". I just finished reading this for a second time (there is a problem that I was struggling with that your article helped me solve). I have referred many people to this article as a good example of "How to reap the benefits of Window functions".

I had one small question... I noticed you used the Latin1_General_BIN collation trick (WHERE SUBSTRING(@pString,t.N,1) COLLATE Latin1_General_BIN = @pDelimiter) in you CSV example but did not use it in dbo_DelimitedSplit8K_LEAD. Is there a specific reason that you did not use it there?

Not as late as my answer Alan:-D

The reason for not doing a binary collation on the first part was not changing anything within the original DelimitedSplit8K code apart from introducing the lead function in order to reflect only the changes from charindex to lead.

😎

lucyliu0301 Grasshopper Points: 19 More actions · Answer 5

lucyliu0301

Grasshopper

Points: 19

April 22, 2016 at 7:51 am

#1873908

Good article, well, lengthy. There are better areas to demonstrate the benefit of window functions than parsing strings. For example, when you need to compare rows based on some sequence.

Using xml functions in SQL server is much easier to parse strings.:-)

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 6

lucyliu0301 (4/22/2016)

Good article, well, lengthy. There are better areas to demonstrate the benefit of window functions than parsing strings. For example, when you need to compare rows based on some sequence.

Using xml functions in SQL server is much easier to parse strings.:-)

Using XML functions to split CSV is also much slower.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Alan Burstein SSC Guru Points: 61136 More actions · Answer 7

Alan Burstein

SSC Guru

Points: 61136

April 22, 2016 at 11:31 am

#1873975

Nice to see this article again. Still awesome!:-P

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

Eirikur Eiriksson SSC Guru Points: 182631 More actions · Answer 8

Jeff Moden (4/22/2016)

lucyliu0301 (4/22/2016)

Good article, well, lengthy. There are better areas to demonstrate the benefit of window functions than parsing strings. For example, when you need to compare rows based on some sequence.

Using xml functions in SQL server is much easier to parse strings.:-)

Using XML functions to split CSV is also much slower.

Second Jeff's input here, for less than 8000/4000 characters, XML is much slower. Normally I see this assumption where no proper testing has been done, few functions have gone through as rigorous testing as the DelimitedSplit8K/4K functions, thanks to Jeff (cudos Jeff and SSC) which means that on the areas where to demonstrate the benefits of the Window functions, there are hardly any better challenges.

😎

Further if one needs more than the number of elements one can fit within 8000/4000 characters string then as Jeff recently posted, "you are doing something wrong";-)

Eirikur Eiriksson SSC Guru Points: 182631 More actions · Answer 9

Eirikur Eiriksson

SSC Guru

Points: 182631

April 22, 2016 at 12:17 pm

#1873989

Alan.B (4/22/2016)

Nice to see this article again. Still awesome!:-P

Thanks Alan! When are you going to do a piece about your interesting work?

😎

Alan Burstein SSC Guru Points: 61136 More actions · Answer 10

Eirikur Eiriksson (4/22/2016)

Alan.B (4/22/2016)

Nice to see this article again. Still awesome!:-P

Thanks Alan! When are you going to do a piece about your interesting work?

😎

Very soon sir, I have a few things I just need to clean up a little before submitting.:-D

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

akljfhnlaflkj SSC Guru Points: 76202 More actions · Answer 11

akljfhnlaflkj

SSC Guru

Points: 76202

April 25, 2016 at 6:43 am

#1874200

Thanks for the article.

akljfhnlaflkj SSC Guru Points: 76202 More actions · Answer 12

akljfhnlaflkj

SSC Guru

Points: 76202

April 25, 2016 at 6:45 am

#1874201

robert_verell (3/24/2014)

I like this article if anything for the West Point, MS reference.

For a moment I thought of the Academy.

alastair.beveridge SSCrazy Points: 2481 More actions · Answer 13

Having just moved from SQL 2008, I'm finally able to use this improved version of the splitter function.

I have just one comment, the ISNULL(NULLIF(LEAD(s.N1,1,1) over (order by s.N1)-1,0)-s.N1,8000) seems to be doing a little more work than it needs to - the default value for the LEAD function could be changed to 8000, so you don't need the isnull or nullif. It must make a few microseconds of a difference in performance.
LEAD(s.N1,1,8000) over (order by s.N1)-1-s.N1

From the little testing I've done, this does seem to work.

Lynn Pettis SSC Guru Points: 442447 More actions · Answer 14

Lynn Pettis

SSC Guru

Points: 442447

November 2, 2018 at 7:52 am

#2012187

Maybe I missed it in the article, I have read through it twice, on the CSV code why are you using the Latin1_General_BIN collation?

Eirikur Eiriksson SSC Guru Points: 182631 More actions · Answer 15

Eirikur Eiriksson

SSC Guru

Points: 182631

November 2, 2018 at 8:08 am

#2012190

Lynn Pettis - Friday, November 2, 2018 7:52 AM

Maybe I missed it in the article, I have read through it twice, on the CSV code why are you using the Latin1_General_BIN collation?

The reason is that it using a binary collations is more efficient than using a language specific collations.
😎