Solving the "Running Total" & "Ordinal Rank" Problems in SS 2k/2k5

  • Christopher Ford (2/18/2008)


    For totals that involve groups, I wonder if you can split the work in half.

    A multi-thread CLR?

    Where the first query get the first half while the second query gets the rest.

    Then join the threads.

    Obviously, for solutions requiring a running total from start to finish this wouldn't work. But if should work stellar for grouped running totals.

    Could also do the same with T-SQL... imagine 6 or 8 jobs all kicking off at the same time? Heck... if you have the TempDB for it, that would wail... grouped running totals on 6-8 million rows in 6-10 seconds. 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Christopher Ford (2/18/2008)


    BTW, wouldn't the RowID go increasing by date in this test case?

    98% of the time, you wouldn't run into a situation where RowID 9876 is an earlier date than RowID 5000. Right?

    Not necessarily... The source of information could come from, say, many different feeds from many different bank branches. For sure, they won't be as randomized as they are in the original JBMTest table, but it's just as likely that RowID will identify the correct chronilogical order either.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • I have fooled around with some UDF replacements such as calculating

    date ranges, .... and I've been amazed how fast this works on CLR.

    I'd be real interested in knowing what kind of date ranges you're calculating... in fact, just for fun, I wouldn't mind trying to beat the CLR's you've made (someone else would have to run the tests, though). Not for bragging rights, either. Up to now, folks like Matt, myself, and others have been able to beat all but RegEx replacement hands down. And, considering the apparent problems folks have had with CLR's and 64 bit installations, it would be nice to offer high speed alternatives to such problems as calculated date ranges using only T-SQL.

    Lordy, we should write a book... corroboration like this is damned fun. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden (2/19/2008)


    I have fooled around with some UDF replacements such as calculating

    date ranges, .... and I've been amazed how fast this works on CLR.

    I'd be real interested in knowing what kind of date ranges you're calculating... in fact, just for fun, I wouldn't mind trying to beat the CLR's you've made (someone else would have to run the tests, though). Not for bragging rights, either. Up to now, folks like Matt, myself, and others have been able to beat all but RegEx replacement hands down. And, considering the apparent problems folks have had with CLR's and 64 bit installations, it would be nice to offer high speed alternatives to such problems as calculated date ranges using only T-SQL.

    Lordy, we should write a book... corroboration like this is damned fun. 🙂

    I already have a title for it.

    "Stupidity got us into this mess...Now What?"

    Christopher Ford

  • Heh... LMAO! Now that's funny, Chris! Ya know... I bet it would sell like hotcakes with a title like that!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden (2/19/2008)


    Heh... LMAO! Now that's funny, Chris! Ya know... I bet it would sell like hotcakes with a title like that!

    I bet it would. =)

    Christopher Ford

  • ALZDBA (2/19/2008)


    Thanks 🙂

    OT bis

    Is there an "common CLR pittfals" link somewhere ?

    I haven't found a good one yet.

    The short list I have is this:

    - SQL CLR is NOT the same as "regular" CLR. You'll find that (by default) there are lots of things you might have had access to using the full framework outside of this area that are "off-limits". You CAN hack them in/back-door them in, but that then means you're back to building (by hand) custom assemblies that then need to manually be associated to SQL Server. Starts to sounds an awful lot like all of those nightmares we had with custom XP's....

    - MEMORY. One place that it does seem to really disappoint me is in memory management. I have yet to find a "smart" way to get these features to use RAM for as much as it can handle, but then page the stuff out if need be. Once the memory limit gets hit - you get those lovely messages previously mentioned. You COULD code in manual "page to disk" routines, but - didn't we want to get away from that stuff based on all the screw-ups with that in the 90's?

    - From what I can tell - in both the functions and the SP definitions provided - the only way to access data and return it seems to be to materialize the ENTIRE dataset before you can start returning anything. That means (when you combine this with item # 2), that your data set has to be able to fit into memory. The lack of an ability (at least that I've found so far) to stream both input and output directly from and to SQL server is a HUGE flaw.

    - Garbage collection. do it manually if you use something a LOT. The default behavior WILL clean up after you - just not fast enough.

    - don't bother to run anything CLR on anything less than SP2. It's not just a memory problem - that's called a memory LEAK.

    - CROSS APPLY "sucks" performance out, so anything that would normally be used in that circumstance would need to be 20% more performant than a "straight" T-SQL solution, just to compensate for the inefficiency of a CROSS APPLY.

    That being said - it DOES have access to a whole raft of functions T-SQL doesn't, so it does open doors on all sorts of fun things: string manipulation, interaction with the OS, fancy formatting routines, advanced encryption processes. Of course - you're then heading into "should SQL Server even be touching that stuff" land, so tread carefully.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Jeff Moden (2/19/2008)


    Heh... LMAO! Now that's funny, Chris! Ya know... I bet it would sell like hotcakes with a title like that!

    Slightly OT rant:

    Seriously though, you cannot find a book that teaches or explains some of the most common tasks that have to be undertaken by DBA's and Database Developers.

    - Calculating last tuesday and next Tuesday exclusive of this Tuesday

    - Running Totals by groups

    - Running Totals by Date

    - Running Totals by Year & also even a sliding Year or timeline

    - Writing code that is concurrent & safe

    - Top 10 things you must do when looking for problems in your SQL Server and how to do it.

    - Multi-server query processing

    - Matching current and preceeding rows

    - Regular Expressions and when to use them and why they're useful

    - SQL CLR Do's and Dont's

    - Are triggers really my friend?

    - Top 10 checklist in creating any index for any query.

    - Efficiently aging past due balances

    Above all, even though it's tradition, I hate that every example I find is "hello world". I've done said Hello to the world many times...but since when does saying "Hello World" efficiently calculate the aging of all your customers balances?

    To some degree all of those things could be found on the internet, but the enormous effort it takes to find a lot of it to get enough information to understand it, takes months.

    --Rant Over--

    There' should be an IFCode for [Rant] :w00t:

    Christopher Ford

  • Can't say I agree with that title. Just did a first visit at a new client last week for a performance review. They had a relatively new product released recently that was starting to cause significant performance issues on the servers it was deployed to. Turns out that about 80% of the data access was via cursors, including things such as cursors-inside-functions-used-in-where-clauses. But things like this aren't due to stupidity. They are due to lack of knowledge/training.

    I bet most of you don't know the neutron lifecycle (throwback from my Nuclear Navy days 🙂 ) but it isn't because you aren't smart enough to know it. If you throw object-oriented-centric developers (pretty much everyone coming out of college these days) at a database they have no framework to go on other than object-oriented. Their brains have absolutely no capacity to think in set-based mode. Until they are trained nothing good can come from having them develop database applications. And even training isn't really sufficient. It takes experience and time (as in time to evaluate the performance effects of different data access queries) to build sufficient capabilities to write good database code. Most shops have far too aggressive development cycles to allow for that.

    Which I must say is fortunate for me because I wouldn't have a job otherwise! 😀

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Matt Miller (2/19/2008)


    ALZDBA (2/19/2008)


    Thanks 🙂

    OT bis

    Is there an "common CLR pittfals" link somewhere ?

    I haven't found a good one yet.

    The short list I have is this:

    - SQL CLR is NOT the same as "regular" CLR. You'll find that (by default) there are lots of things you might have had access to using the full framework outside of this area that are "off-limits". You CAN hack them in/back-door them in, but that then means you're back to building (by hand) custom assemblies that then need to manually be associated to SQL Server. Starts to sounds an awful lot like all of those nightmares we had with custom XP's....

    - MEMORY. One place that it does seem to really disappoint me is in memory management. I have yet to find a "smart" way to get these features to use RAM for as much as it can handle, but then page the stuff out if need be. Once the memory limit gets hit - you get those lovely messages previously mentioned. You COULD code in manual "page to disk" routines, but - didn't we want to get away from that stuff based on all the screw-ups with that in the 90's?

    - From what I can tell - in both the functions and the SP definitions provided - the only way to access data and return it seems to be to materialize the ENTIRE dataset before you can start returning anything. That means (when you combine this with item # 2), that your data set has to be able to fit into memory. The lack of an ability (at least that I've found so far) to stream both input and output directly from and to SQL server is a HUGE flaw.

    - Garbage collection. do it manually if you use something a LOT. The default behavior WILL clean up after you - just not fast enough.

    - don't bother to run anything CLR on anything less than SP2. It's not just a memory problem - that's called a memory LEAK.

    - CROSS APPLY "sucks" performance out, so anything that would normally be used in that circumstance would need to be 20% more performant than a "straight" T-SQL solution, just to compensate for the inefficiency of a CROSS APPLY.

    That being said - it DOES have access to a whole raft of functions T-SQL doesn't, so it does open doors on all sorts of fun things: string manipulation, interaction with the OS, fancy formatting routines, advanced encryption processes. Of course - you're then heading into "should SQL Server even be touching that stuff" land, so tread carefully.

    See my C# example... It materlizes and returns results before the entire dataset has been loaded in the CLR.

    SQL CLR Stored Procedures can do this. UDF's and UDA's cannot. They must materilize the entire result set in memory before returning it.

    You can also use the "SqlPipe.ExecuteAndSend( )" to send the code to SQL Server and just execute and return the results if you're not modifying it at all...

    Useful for creating dynamic queries from SQL CLR and using the .Net framework to create them.

    SQL CLR is also extremely useful for doing things outside of SQL, like importing or creating XML documents when the job is not big enough for Integration Services. Also, matching user names or pulling user information from Active Directory using a CLR function is neat too...

    I would like to give the SQL CLR the benefit of the doubt sometimes and place blame on my lack of experience in developing Stored Procedures or Functions in the SQL CLR, like maybe I didn't compile my code in 64 bit before deploying it, maybe I left the DEBUG tag on and that's why performance stinks, maybe I put something somewhere that's just hogging all my resources and I don't realize it.

    It's a new way of designing and a new way of debugging, so figuring out why streaming data is so slow when it's supposed to be fast is tough to debug.

    So, currently I am giving the CLR the benefit of the doubt and placing blame on myself for not knowing how to efficiently code or properly code in the CLR due to lack of documentation and also for what seems to be a lack of support from Microsoft in helping DBA's and Developers understand the in's and out's of this new way of doing things.

    That's my feelings on it.

    Technically, I should be able to use a CLR solution to run faster than an Update solution... .NET should be able to, by using it's wonderous abilities, chew this type of problem up and spit it back out to the database in the right order faster than SQL can natively.

    This running totals problem/solution has really opened my eyes to many things that I "presumed" to be happening behind the scenes.

    Overall, this is probably the most beneficial exercise I've ever done on SQL.

    Christopher Ford

  • TheSQLGuru (2/19/2008)


    Can't say I agree with that title. Just did a first visit at a new client last week for a performance review. They had a relatively new product released recently that was starting to cause significant performance issues on the servers it was deployed to. Turns out that about 80% of the data access was via cursors, including things such as cursors-inside-functions-used-in-where-clauses. But things like this aren't due to stupidity. They are due to lack of knowledge/training.

    I bet most of you don't know the neutron lifecycle (throwback from my Nuclear Navy days 🙂 ) but it isn't because you aren't smart enough to know it. If you throw object-oriented-centric developers (pretty much everyone coming out of college these days) at a database they have no framework to go on other than object-oriented. Their brains have absolutely no capacity to think in set-based mode. Until they are trained nothing good can come from having them develop database applications. And even training isn't really sufficient. It takes experience and time (as in time to evaluate the performance effects of different data access queries) to build sufficient capabilities to write good database code. Most shops have far too aggressive development cycles to allow for that.

    Which I must say is fortunate for me because I wouldn't have a job otherwise! 😀

    I agree it's a "strong" title...but it's not really about stupidity. =)

    and why should anyone be afraid of a strong title or actually being honest about what's happening without sugar coating it.

    Ever see Kitchen Nightmares?

    Tough love, work with them in the trenches to show you care, reap the rewards of their happiness when they realize there's a better way.

    When I do training, I've found that most people do not remember a thing when you cater to being politically correct or explaining things.

    They do remember bad examples and the challenge of overcoming them. And they also remember that they don't want to be caught in the same mess when they go live with their code.

    Christopher Ford

  • I bet most of you don't know the neutron lifecycle (throwback from my Nuclear Navy days 🙂 ) but it isn't because you aren't smart enough to know it. If you throw object-oriented-centric developers (pretty much everyone coming out of college these days) at a database they have no framework to go on other than object-oriented. Their brains have absolutely no capacity to think in set-based mode. Until they are trained nothing good can come from having them develop database applications. And even training isn't really sufficient. It takes experience and time (as in time to evaluate the performance effects of different data access queries) to build sufficient capabilities to write good database code. Most shops have far too aggressive development cycles to allow for that.

    This is, unfortunately, the story of the modern software world, and it is quite the tragedy. I happen to think it essential that people who do programming -- including writing queries in SQL -- really need and benefit from a solid basic CompSci background. Yet today, even top colleges are giving up on teaching CompSci, and instead training C++ or C# recipes. This is a huge loss!

    It's simply not enough to know the recipies, to know some basics about object oriented procedural coding; to be successful and efficient, you really need to understand the foundations of programming, both declarative and procedural, functional and predicate-based. Without this knowledge, programmers are just left to adapt one set of recipies to a completely different environment, with the expected outcome....

    -frank


    The End.

  • TheSQLGuru (2/19/2008)


    Can't say I agree with that title. Just did a first visit at a new client last week for a performance review. They had a relatively new product released recently that was starting to cause significant performance issues on the servers it was deployed to. Turns out that about 80% of the data access was via cursors, including things such as cursors-inside-functions-used-in-where-clauses. But things like this aren't due to stupidity. They are due to lack of knowledge/training.

    I bet most of you don't know the neutron lifecycle (throwback from my Nuclear Navy days 🙂 ) but it isn't because you aren't smart enough to know it. If you throw object-oriented-centric developers (pretty much everyone coming out of college these days) at a database they have no framework to go on other than object-oriented. Their brains have absolutely no capacity to think in set-based mode. Until they are trained nothing good can come from having them develop database applications. And even training isn't really sufficient. It takes experience and time (as in time to evaluate the performance effects of different data access queries) to build sufficient capabilities to write good database code. Most shops have far too aggressive development cycles to allow for that.

    Which I must say is fortunate for me because I wouldn't have a job otherwise! 😀

    Heh... I actually do know about that particular life cycle... same reason, too.

    Have you heard about the new element they discovered? It's called "Administratium"... doesn't matter how it's applied, it always slows everything it gets near down. It has no Electrons, Protons, or Neutrons... it's made up entirely of "Morons". 😛

    Just a bit of fun there...

    No, I agree with you, Kevin... folks just don't know and that doesn't make them stupid... just ignorant of a particular area. Ignorance can certainly be forgiven. But, I still like the title that Chris came up with... it's not a real reflection on people. When I get into a new area and something goes wrong, I'll frequently say, "Ok, I've done something stupid... I better find it in the book."

    My problem, lately, is that people know so little about SQL Server in the Detroit area, they think any ol' GUI programmer or network admin can maintain and program SQL Server... they don't really have a clue about performance code... doesn't make for too many good jobs for folks like me. Kinda like finding chicken's teeth and my chicken just flew away to a different state.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Kinda like finding chicken's teeth and my chicken just flew away to a different state.

    Eh.... What? :blink:

    ---

    Frank & Kevin both bring up some very valid points though with the state of the industry.

    People just don't know what to do anymore when it comes to databases or developing applications.

    It's become a world of wizards and GUI's. It's getting very tough to find people who can apply a basic index and understand what happened.

    Christopher Ford

  • TheSQLGuru (2/19/2008)


    Can't say I agree with that title. [...] But things like this aren't due to stupidity. They are due to lack of knowledge/training.

    I bet most of you don't know the neutron lifecycle [...]

    However, most of us also wouldn't take on a project that required knowing the neutron lifecycle without actually learning it.

    --

    JimFive

Viewing 15 posts - 166 through 180 (of 250 total)

You must be logged in to reply to this topic. Login to reply