A few reflections on RBAR by a weary application developer

  • patrickmcginnis59 10839 (12/9/2013)


    I'm not saying there isn't a penalty for RBAR in T-SQL. Rather, I think it would be useful to know why the penalty is so drastic.

    edit: fixed quote tag

    You're question is confused because it is not more drastic in TSQL.

    It is actually more drastic in code that executes in memory than it is in SQL code that executes against RDBMS.

    Try to update a 2GB dataset one row at a time using Java or .NET. Make sure this code exams each row, tests each row, and execute code in memory against each row in your dataset.

    Do this in any language or program and you will reduce the optimal execution rate exponentially according to the size of the dataset being inspected.

    Now update that program to use SET math to create a sub set and then update the sub set.

    See how fast your update can run against that.

    It will always be faster to update the Fied in every row of an application dataset where that field is null instead of testing each row for NULL and then executing an update on the field if the value is NULL.

    It is not SQL specific, it just happens more in SQL because non SET based functions stick out like red on a blue background in SQL code. Also in SQL code we usually work with datasets that are greater than one thousand data pages or (1000*8 bytes) and not less than that.

    RBAR is a concept that helps you write better code in any language. Sets are cool. Venn diagrams are awesome. paging through things one row at a time is cliche' and dull.

    Understand?

  • MarbryHardin (12/9/2013)


    One of the somewhat insidious problems with undetected or uncontrolled use of RBAR is that it may often work fine. At first.

    As both the load and number of records silently increase the technique can cause previously unseen errors and bring the server to its knees. And of course that can potentially cause issues with unrelated applications that happen to be on the same instance.

    It's the opposite of "scaling", whatever that might be called. If this isn't detected and corrected before it constitutes a large portion of the code base you're then looking at a significant effort to correct it.

    I've fought the fight to wean people off cursors, it can be a hard battle. But when you have someone that writes a cursor to do a simple insert of a single literal value they need to admit they have problem.

    Another painful problem with rbar is that if you do come across an error you can end up with a partially executed command. And now you have to not only find the error but figure out what may or may not need to be rolled back or changed to finish the job.

  • rstone (12/9/2013)


    I think RBAR is an option to avoid locking too many resources at once to avoid a costly re-design. (A business decision to not re-design can be valid.)

    On the other hand, we have some crazy RBAR abuse - nested procedures with nested cursors. So perhaps a "No RBAR!" mallet is needed in some cases. However, I don't think it's fair to say RBAR is never to be used. Like de-normalization, it's not the place to start.

    I'm sure we all (Jeff Moden included!) can think of scenarios where being stuck with a poorly designed and implemented system makes RBAR a logical choice to accomplish some task. That's not exactly an endorsment of RBAR, though. It's more of a condemnation of a system where RBAR wins out over re-design as the lesser of two evils.

  • Andy DBA (12/9/2013)


    rstone (12/9/2013)


    I think RBAR is an option to avoid locking too many resources at once to avoid a costly re-design. (A business decision to not re-design can be valid.)

    On the other hand, we have some crazy RBAR abuse - nested procedures with nested cursors. So perhaps a "No RBAR!" mallet is needed in some cases. However, I don't think it's fair to say RBAR is never to be used. Like de-normalization, it's not the place to start.

    I'm sure we all (Jeff Moden included!) can think of scenarios where being stuck with a poorly designed and implemented system makes RBAR a logical choice to accomplish some task. That's not exactly an endorsment of RBAR, though. It's more of a condemnation of a system where RBAR wins out over re-design as the lesser of two evils.

    That is the only time I have seen cursors as a truly viable option. It's amazing how much terrible code needs to be written to deal with a poorly structured system. Cursors do have their uses but they definitely shouldn't be the first choice.

  • PHYData DBA (12/9/2013)


    patrickmcginnis59 10839 (12/9/2013)


    I'm not saying there isn't a penalty for RBAR in T-SQL. Rather, I think it would be useful to know why the penalty is so drastic.

    edit: fixed quote tag

    You're question is confused because it is not more drastic in TSQL.

    We'll probably have to agree to disagree on this one! All the folks who's opinion I would respect say that SQL Server is designed to be more efficient with set based operations, and this would allow that T-SQL is then expected to be slower at RBAR type operations.

  • patrickmcginnis59 10839 (12/9/2013)


    PHYData DBA (12/9/2013)


    patrickmcginnis59 10839 (12/9/2013)


    I'm not saying there isn't a penalty for RBAR in T-SQL. Rather, I think it would be useful to know why the penalty is so drastic.

    edit: fixed quote tag

    You're question is confused because it is not more drastic in TSQL.

    We'll probably have to agree to disagree on this one! All the folks who's opinion I would respect say that SQL Server is designed to be more efficient with set based operations, and this would allow that T-SQL is then expected to be slower at RBAR type operations.

    You still are reading one thing and not agreeing with another. A common mistake when you do not understand RBAR.

    You are right, SQL and RDBMS will optimize a SET based function to run faster.

    That has nothing to do with RBAR being slower than SET based functions in SQL than other application languages.

    TSQL (or even PSQL) will update with RBAR a 12GB Dataset faster than any other programing language that uses dataset logic.

    The idea is that any DataSet (notice the word SET) update in any language can be optimized with Set based code and slowed down exponentially with RBAR code.

  • patrickmcginnis59 10839 (12/9/2013)


    Solomon Rutzky (12/9/2013)


    patrickmcginnis59 10839 (12/9/2013)


    Solomon Rutzky (12/9/2013)


    patrickmcginnis59 10839 (12/9/2013)


    I'm not saying there isn't a penalty for RBAR in T-SQL. Rather, I think it would be useful to know why the penalty is so drastic.

    I believe I addressed this (for the most part) when I mentioned ACID compliance overhead in a post from a page or two ago in this thread. Meaning: the locking, unlocking, managing of contention on shared resources, transaction log writing, etc.

    So should I expect a T-SQL loop with NO data access then to be really really fast?

    "Fast" is a relative term so what are you comparing it to?

    Well when I did comparisons, it was to pretty much any of the languages I had access to, and T-SQL was dog slow even with no data access. I mean orders of magnitude slow.

    Dog slow at what, exactly? You first said "a T-SQL loop with NO data access". Again, I will say that databases are tuned for queries and a loop is not a query.

    Regardless, a loop to do what? Nothing? Maybe just set a variable to a value? I created a CMD script to run in a DOS window as follows:

    @ECHO OFF

    ECHO %TIME%

    FOR /L %%f IN (1,1,1000000) DO SET X=5

    ECHO %TIME%

    It finished in approximately 30 seconds. I then did the following in T-SQL:

    DECLARE @X INT, @INDEX INT = 0

    SELECT GETDATE()

    WHILE (@INDEX < 1000000)

    BEGIN

    SET @X=5

    SET @INDEX+=1

    END

    SELECT GETDATE()

    The same 1 million iterations finished in 0.7 seconds. Increasing the loop to 10 MM rows ran in the expected 7 seconds.

    But even if the loop is doing calculations or string manipulation, those things are not queries.

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • PHYData DBA (12/9/2013)


    patrickmcginnis59 10839 (12/9/2013)


    PHYData DBA (12/9/2013)


    patrickmcginnis59 10839 (12/9/2013)


    I'm not saying there isn't a penalty for RBAR in T-SQL. Rather, I think it would be useful to know why the penalty is so drastic.

    edit: fixed quote tag

    You're question is confused because it is not more drastic in TSQL.

    We'll probably have to agree to disagree on this one! All the folks who's opinion I would respect say that SQL Server is designed to be more efficient with set based operations, and this would allow that T-SQL is then expected to be slower at RBAR type operations.

    You still are reading one thing and not agreeing with another. A common mistake when you do not understand RBAR.

    I think we're in agreement here, we both believe the other doesn't know what they're talking about!

  • Solomon Rutzky (12/9/2013)


    patrickmcginnis59 10839 (12/9/2013)


    Solomon Rutzky (12/9/2013)


    patrickmcginnis59 10839 (12/9/2013)


    Solomon Rutzky (12/9/2013)


    patrickmcginnis59 10839 (12/9/2013)


    I'm not saying there isn't a penalty for RBAR in T-SQL. Rather, I think it would be useful to know why the penalty is so drastic.

    I believe I addressed this (for the most part) when I mentioned ACID compliance overhead in a post from a page or two ago in this thread. Meaning: the locking, unlocking, managing of contention on shared resources, transaction log writing, etc.

    So should I expect a T-SQL loop with NO data access then to be really really fast?

    "Fast" is a relative term so what are you comparing it to?

    Well when I did comparisons, it was to pretty much any of the languages I had access to, and T-SQL was dog slow even with no data access. I mean orders of magnitude slow.

    Dog slow at what, exactly? You first said "a T-SQL loop with NO data access". Again, I will say that databases are tuned for queries and a loop is not a query.

    Regardless, a loop to do what? Nothing? Maybe just set a variable to a value? I created a CMD script to run in a DOS window as follows:

    @ECHO OFF

    ECHO %TIME%

    FOR /L %%f IN (1,1,1000000) DO SET X=5

    ECHO %TIME%

    It finished in approximately 30 seconds. I then did the following in T-SQL:

    DECLARE @X INT, @INDEX INT = 0

    SELECT GETDATE()

    WHILE (@INDEX < 1000000)

    BEGIN

    SET @X=5

    SET @INDEX+=1

    END

    SELECT GETDATE()

    The same 1 million iterations finished in 0.7 seconds. Increasing the loop to 10 MM rows ran in the expected 7 seconds.

    But even if the loop is doing calculations or string manipulation, those things are not queries.

    I bow before you. I am afraid you are showing Perl to Perk....

    ok so it not so funny but neither has been this thread... :w00t:

  • Solomon Rutzky (12/9/2013)


    patrickmcginnis59 10839 (12/9/2013)


    Solomon Rutzky (12/9/2013)


    patrickmcginnis59 10839 (12/9/2013)


    Solomon Rutzky (12/9/2013)


    patrickmcginnis59 10839 (12/9/2013)


    I'm not saying there isn't a penalty for RBAR in T-SQL. Rather, I think it would be useful to know why the penalty is so drastic.

    I believe I addressed this (for the most part) when I mentioned ACID compliance overhead in a post from a page or two ago in this thread. Meaning: the locking, unlocking, managing of contention on shared resources, transaction log writing, etc.

    So should I expect a T-SQL loop with NO data access then to be really really fast?

    "Fast" is a relative term so what are you comparing it to?

    Well when I did comparisons, it was to pretty much any of the languages I had access to, and T-SQL was dog slow even with no data access. I mean orders of magnitude slow.

    Dog slow at what, exactly? You first said "a T-SQL loop with NO data access". Again, I will say that databases are tuned for queries and a loop is not a query.

    Regardless, a loop to do what? Nothing? Maybe just set a variable to a value? I created a CMD script to run in a DOS window as follows:

    @ECHO OFF

    ECHO %TIME%

    FOR /L %%f IN (1,1,1000000) DO SET X=5

    ECHO %TIME%

    It finished in approximately 30 seconds. I then did the following in T-SQL:

    DECLARE @X INT, @INDEX INT = 0

    SELECT GETDATE()

    WHILE (@INDEX < 1000000)

    BEGIN

    SET @X=5

    SET @INDEX+=1

    END

    SELECT GETDATE()

    The same 1 million iterations finished in 0.7 seconds. Increasing the loop to 10 MM rows ran in the expected 7 seconds.

    But even if the loop is doing calculations or string manipulation, those things are not queries.

    Well to be honest, a command shell script wouldn't have been my first choice, so kudos on that one!

  • :hehe:

  • patrickmcginnis59 10839 (12/9/2013)


    Solomon Rutzky (12/9/2013)


    patrickmcginnis59 10839 (12/9/2013)


    Solomon Rutzky (12/9/2013)


    patrickmcginnis59 10839 (12/9/2013)


    Solomon Rutzky (12/9/2013)


    patrickmcginnis59 10839 (12/9/2013)


    I'm not saying there isn't a penalty for RBAR in T-SQL. Rather, I think it would be useful to know why the penalty is so drastic.

    I believe I addressed this (for the most part) when I mentioned ACID compliance overhead in a post from a page or two ago in this thread. Meaning: the locking, unlocking, managing of contention on shared resources, transaction log writing, etc.

    So should I expect a T-SQL loop with NO data access then to be really really fast?

    "Fast" is a relative term so what are you comparing it to?

    Well when I did comparisons, it was to pretty much any of the languages I had access to, and T-SQL was dog slow even with no data access. I mean orders of magnitude slow.

    Dog slow at what, exactly? You first said "a T-SQL loop with NO data access". Again, I will say that databases are tuned for queries and a loop is not a query.

    Regardless, a loop to do what? Nothing? Maybe just set a variable to a value? I created a CMD script to run in a DOS window as follows:

    @ECHO OFF

    ECHO %TIME%

    FOR /L %%f IN (1,1,1000000) DO SET X=5

    ECHO %TIME%

    It finished in approximately 30 seconds. I then did the following in T-SQL:

    DECLARE @X INT, @INDEX INT = 0

    SELECT GETDATE()

    WHILE (@INDEX < 1000000)

    BEGIN

    SET @X=5

    SET @INDEX+=1

    END

    SELECT GETDATE()

    The same 1 million iterations finished in 0.7 seconds. Increasing the loop to 10 MM rows ran in the expected 7 seconds.

    But even if the loop is doing calculations or string manipulation, those things are not queries.

    Well to be honest, a command shell script wouldn't have been my first choice, so kudos on that one!

    It doesn't really matter how fast t-sql performs procedural processes compared to other languages for set operations set based commands will always be faster than rbar solutions.

  • Sets are nothing magical, not on the math side and not on processing side, thus let this not turn into a religious "set vs no_set" war.

    The main point to keep in mind is that the server generally can do very good job and is more adaptive then a programmer can be. Not having to hand-code and optimize every procedure and being adaptive to run-time conditions is a good goal all on its own! Now if you look at SQL Server execution plans, then you see all kind of operators that is iterated trough. And at the most detailed level, this sometimes happens one record at a time, or in small batches of records.

    But please at this point realize at least two things:

    1. The primary stage of optimization has already happened to produce this plan (it is generally good regardless query complexity, thanks to set based math, the model and accurate statistics);

    2. The data "streams" trough all these chosen iterators, taking as little resources as possible to get the job done and not starve or lock out other simultaneous requests for data.

    In languages like say C#, you also can work with similar iterators to save on resources and complexity for problems involving large amounts of data, thereby improve concurrency in heavily threaded environments. Language constructs such as LINQ go further and try to emulate the way current relational databases take on problems. But with a very limited model (the datatypes) and no statistics to work with, it is generally very sub-optimal and sometimes not even working at all.

    The whole deal with all this is to simplify resource management and improve concurrency to make things scale up nicely. To get those results it helps to think of and solve problems using sets as your mental model, regardless of how it in detail is executed. Because at the set level, there are more optimization possibilities to take advantage of!

    Is there overhead for times when you are deliberately not using sets larger then one record in a system optimized for large sets...you bet ya...and in any language other then SQL too, it is a matter of degrees!

  • patrickmcginnis59 10839 (12/9/2013)


    Jeff Moden (12/9/2013)


    patrickmcginnis59 10839 (12/9/2013)


    I didn't see any explanation on why RBAR is so slow, so rated this article 1 star.

    This is how RBAR comes to be in systems. Many people just don't know and so they lean towards what they do. It gets the job done at the expense of future scalability and rework.

    I'm interested in why RBAR is so slow in SQL systems, and some of the following psychology that develops around this is pretty interesting too. I've seen stuff on the net like "this non-T-SQL program is so slow, I bet it uses RBAR inside".

    I think its getting to be on the level of those cargo-cult societies.

    edit: adjusted cargo-cult sentence 🙂

    Hi Patrick,

    I'm still neck deep in it at work but I did have the time to look at some of the attempts at explaining why RBAR is slower. I actually have a physical example that you'll be able to look at and, after quaffing a mighty and well deserved ale after the nightmare known as "today", I'll try to get it to you. It depends a lot on when I get done today.

    Thank you for the question. The answer might also explain the difference between RBAR and necessary RBR.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • peter-757102 (12/9/2013)


    The whole deal with all this is to simplify resource management and improve concurrency to make things scale up nicely. To get those results it helps to think of and solve problems using sets as your mental model, regardless of how it in detail is executed. Because at the set level, there are more optimization possibilities to take advantage of!

    Is there overhead for times when you are deliberately not using sets larger then one record in a system optimized for large sets...you bet ya...and in any language other then SQL too, it is a matter of degrees!

    Well said Peter - Find a solution that executes on a set of objects instead of testing and executing on each object individually.

Viewing 15 posts - 46 through 60 (of 116 total)

You must be logged in to reply to this topic. Login to reply