A few reflections on RBAR by a weary application developer

  • As in all things, adhere to the KISS principal. Use the appropriate tool, and have more than one tool in your toolbox.

  • patrickmcginnis59 10839 (12/9/2013)


    Well that's what I was wondering, then and now. What makes RBAR so slow if it's not the language? If it's not the language, then cursors should be fast because the first thing done in a cursor is to build a set based query to "fetch" from and then it just loops on the rows fed to it. But I'm all the time reading that cursors are not a good thing performance-wise.

    When you say "cursor", do you mean an explicit T-SQL CURSOR, as in "DECLARE CURSOR / OPEN / FETCH", or any loop through a result set, as in storing results in a temp table and using a WHILE loop with a SELECT to get a row from the temp table each time? I ask because it is not always so simple as building a result set to fetch from. When using T-SQL cursors, there are two main types: Sensitive (i.e. DYNAMIC) and Insensitive (i.e. STATIC).

    Sensitive / DYNAMIC means that the cursor is aware of changes to any tables used in the query and as rows are fetched, the query is reevaluated to make sure the most up-to-date state of the rows and fields in those rows is accounted for. This means that rows could be added as well as removed; the @@FETCH_STATUS variable will return a value of -2 to denote a row that was there originally but is not there at the time of the FETCH.

    Insensitive / STATIC means that the result set at the time of the OPEN call is copied to an internally created temp table. Fetching rows does not reevaluate the query to look for changes.

    SQL Server has another, non-standard, type called KEYSET which is partially Sensitive and partially Static. New rows cannot be added and the order of them cannot change, but rows can be removed (again reported by @@FETCH_STATUS as -2) and field values can change.

    More info found here: http://technet.microsoft.com/en-us/library/ms180169.aspx

    By default, declared cursors are Sensitive / DYNAMIC which means that as each FETCH is called, the query is reevaluated, which gives more accurate information but is obviously not super great for performance. Using the STATIC keyword when declaring a cursor will take a little extra time to create and populate the temp table, but will be faster for each FETCH, even if possibly less accurate if changes are being made to the tables.

    I would expect that fetching rows from an application, such as using SqlDataReader.Read() in C#, would be insensitive, though I am not sure where exactly the result set is cached. Maybe someone else can answer that.

    And just to be clear, the above is not intended to address the RBAR question, but just to clarify the definition of "cursor".

    Take care,

    Solomon...

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • patrickmcginnis59 10839 (12/9/2013)


    Well thats what I was wondering, then and now. What makes RBAR so slow if its not the language? If its not the language, then cursors should be fast because the first thing done in a cursor is to build a set based query to "fetch" from and then it just loops on the rows fed to it. But I'm all the time reading that cursors are not a good thing performancewise.

    Patrick,

    could you please give an example of what you are talking about?

    We keep trying to explain that RBAR is a concept of processing data from any source Row by Agonizing Row.

    You keep telling us that this is only a T-SQL thing, we keep posting back that we don't agree since we see RBAR code everywhere.

    What exactly is your example or personal situation that makes you so certain this is a T-SQL issue with MS SQL servers?

    "Seriously yo, we just want to know, are you looking for knowledge or are you just another Troll?"

    - Annonymous - 😉

  • PHYData DBA (12/9/2013)


    Jeff Moden (12/9/2013)


    Consider a daily data file import, 60k rows. The data is loaded, of course, row by agonizing row, into a temp table. The mighty DBA, with his scimitar, strikes a mighty blow and updates the destination tables in one enormous, amazing join. The update runs in production in under three minutes.

    Yay! Everyone cries! We weep with the endless possibilities of mighty set based queries!

    ....

    The SQL God continues to read through tears of laughter

    Welcome to the battle known as daily, my young friend.

    😎 Jeff Moden The SQL Gods have always smiled upon your humor. Laughing with others and at your self is the greatest wisdom of all. 😎

    I suppose the funniest thing about Jeff's response was that there are quite a number of developers who feel like going to the database to get their data is like some kind of magic quest in a strange fantasy world. I've always tried to educate people on the benefits of set based approaches everywhere I've worked, long before I ever heard the term RBAR.

  • Chris Harshman (12/10/2013)


    PHYData DBA (12/9/2013)


    Jeff Moden (12/9/2013)


    Consider a daily data file import, 60k rows. The data is loaded, of course, row by agonizing row, into a temp table. The mighty DBA, with his scimitar, strikes a mighty blow and updates the destination tables in one enormous, amazing join. The update runs in production in under three minutes.

    Yay! Everyone cries! We weep with the endless possibilities of mighty set based queries!

    ....

    The SQL God continues to read through tears of laughter

    Welcome to the battle known as daily, my young friend.

    😎 Jeff Moden The SQL Gods have always smiled upon your humor. Laughing with others and at your self is the greatest wisdom of all. 😎

    I suppose the funniest thing about Jeff's response was that there are quite a number of developers who feel like going to the database to get their data is like some kind of magic quest in a strange fantasy world. I've always tried to educate people on the benefits of set based approaches everywhere I've worked, long before I ever heard the term RBAR.

    Amen to that Brother! Keep Preaching the good Word!

  • Steph Locke (12/10/2013)


    Here's my non-technical view of RBAR vs Set

    Each process/request requires sending a request to the software/hardware/person which then has to understand the request, convert it into their understanding, plan it, and then go about executing it.

    The time savings of batch-based actions come in when you can do more tasks but keep the request send phase at the same amount or increasing it by only a small amount. So if a request takes 5 seconds to do per item and 1 second to understand beforehand then doing 100 items in a per item basis will take 100*(5 +1 ) = 600 seconds, whereas sending the request in the batch of 100 might up the understanding time to say 20 seconds but this would still be 100*(5 ) + 20 = 520 seconds.

    This is not a SQL only concept and works for all actions including manual ones. You leverage the difference between two changing series values - records (n) and preparation (kn) - such that that if k<1 then you will always benefit from a batch process over an iterative process.

    Edit: For SQL - typically there will also be a constant on records (cn) that means the processing time per record also decreases since it can leverage set-based prgramming actions. i.e. t=cn +kn where k<c<1 is how we get a scalable query and the smaller k and c are the better and/or having an absolute preparation value e.g. from a saved execution plan.

    Steph - If there are SQL Gods, you are their Goddess Queen! Thanks for the math.

  • PHYData DBA (12/10/2013)


    patrickmcginnis59 10839 (12/9/2013)


    Well thats what I was wondering, then and now. What makes RBAR so slow if its not the language? If its not the language, then cursors should be fast because the first thing done in a cursor is to build a set based query to "fetch" from and then it just loops on the rows fed to it. But I'm all the time reading that cursors are not a good thing performancewise.

    Patrick,

    could you please give an example of what you are talking about?

    We keep trying to explain that RBAR is a concept of processing data from any source Row by Agonizing Row.

    You keep telling us that this is only a T-SQL thing, we keep posting back that we don't agree since we see RBAR code everywhere.

    What exactly is your example or personal situation that makes you so certain this is a T-SQL issue with MS SQL servers?

    Everytime I've done a loop in c for instance, I've been pretty satisfied with how it turned out. Whats your counterpoint with RBAR everywhere?

    edit: I'm genuinely interested here, I'm not the inventor of the RBAR thing, and I'm certainly not going to dictate its usage, but at the same time, if some fellow posts that SSMS is slow because its using RBAR (like I saw recently), I'm not going to necessarily feel like I've learned anything from reading that post. We name things and concepts so that we can share our recognition of their occurrences after all, right?

  • patrickmcginnis59 10839 (12/10/2013)


    PHYData DBA (12/10/2013)


    patrickmcginnis59 10839 (12/9/2013)


    Well thats what I was wondering, then and now. What makes RBAR so slow if its not the language? If its not the language, then cursors should be fast because the first thing done in a cursor is to build a set based query to "fetch" from and then it just loops on the rows fed to it. But I'm all the time reading that cursors are not a good thing performancewise.

    Patrick,

    could you please give an example of what you are talking about?

    We keep trying to explain that RBAR is a concept of processing data from any source Row by Agonizing Row.

    You keep telling us that this is only a T-SQL thing, we keep posting back that we don't agree since we see RBAR code everywhere.

    What exactly is your example or personal situation that makes you so certain this is a T-SQL issue with MS SQL servers?

    Everytime I've done a loop in c for instance, I've been pretty satisfied with how it turned out. Whats your counterpoint with RBAR everywhere?

    edit: I'm genuinely interested here, I'm not the inventor of the RBAR thing, and I'm certainly not going to dictate its usage, but at the same time, if some fellow posts that SSMS is slow because its using RBAR (like I saw recently), I'm not going to necessarily feel like I've learned anything from reading that post. We name things and concepts so that we can share our recognition of their occurrences after all, right?

    Ummm - That is your specific example for your entire thread on RBAR being a T-SQL thing? That you read an article sometime, someplace that says SSMS is slow because it uses RBAR? That you like your Row By Row loops in C?

    You are just a troll... so sad. Have fun with that. :sick:

    EDIT: I have no counterpoint to RBAR everywhere Patrick. I was referring to the other posted responses to you here that mention how they had found RBAR in many other languages.

  • patrickmcginnis59 10839 (12/9/2013)


    peter-757102 (12/9/2013)


    Charles Kincaid (12/9/2013)


    So let's take an example of a SELECT that does not make any use of indexes.

    RBAR fetches more rows from the table. False. It has to get the data from somewhere.

    Just iterating over records in a set of records is not RBAR, even with in set based solutions this happens internally.

    Now this is interesting, because doing the running totals thing would be awesome in a cursor! Would a cursor based solution beat the triangular join method? Would a cursor based solution beat a quirky update? Which of the three methods are set oriented versus RBAR?

    I have tested this before and on my system found that a cursor (using Fast_Forward) will go through 1,000,000 rows in about 2 minutes (3 minutes without FAST_FORWARD) whereas the Quirky Update takes about 35 seconds. Other methods such as the triangular join and other self-join methods are filled with Hidden RBAR[/url] and take several minutes for 100,000 rows. Doing it with a loop is even slower.

    I have heard the Quirky Update described as a set-based solution; I honestly don't know enough about it to make a strong argument that it is.

    Ben Gan has a section in his book, Microsoft SQL Server 2012 High-Performance T-SQL Using Window Functions, where he tests the different methods to create a running total. He includes a set-based option specific to 2012 that includes Framing; the "Quirky Update" is the fastest. I heard someone else say on SSC - "It looks like that horse still has 6 legs" 😛

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • PHYData DBA (12/10/2013)


    patrickmcginnis59 10839 (12/10/2013)


    PHYData DBA (12/10/2013)


    patrickmcginnis59 10839 (12/9/2013)


    Well thats what I was wondering, then and now. What makes RBAR so slow if its not the language? If its not the language, then cursors should be fast because the first thing done in a cursor is to build a set based query to "fetch" from and then it just loops on the rows fed to it. But I'm all the time reading that cursors are not a good thing performancewise.

    Patrick,

    could you please give an example of what you are talking about?

    We keep trying to explain that RBAR is a concept of processing data from any source Row by Agonizing Row.

    You keep telling us that this is only a T-SQL thing, we keep posting back that we don't agree since we see RBAR code everywhere.

    What exactly is your example or personal situation that makes you so certain this is a T-SQL issue with MS SQL servers?

    Everytime I've done a loop in c for instance, I've been pretty satisfied with how it turned out. Whats your counterpoint with RBAR everywhere?

    edit: I'm genuinely interested here, I'm not the inventor of the RBAR thing, and I'm certainly not going to dictate its usage, but at the same time, if some fellow posts that SSMS is slow because its using RBAR (like I saw recently), I'm not going to necessarily feel like I've learned anything from reading that post. We name things and concepts so that we can share our recognition of their occurrences after all, right?

    Ummm - That is your specific example for your entire thread on RBAR being a T-SQL thing? That you read an article sometime, someplace that says SSMS is slow because it uses RBAR? That you like your Row By Row loops in C?

    You are just a troll... so sad. Have fun with that. :sick:

    I've run into this sort of thing before, discussions of technical issues that for whatever reason take a personal turn. I'd like to advise you not to do this because more often than not the name caller is the one who leaves a bigger impression than the one being called names. But honestly, thats your call not mine.

    I welcome any return to the subject discussion at hand!

  • patrickmcginnis59 10839 (12/10/2013)


    PHYData DBA (12/10/2013)


    patrickmcginnis59 10839 (12/9/2013)


    Well thats what I was wondering, then and now. What makes RBAR so slow if its not the language? If its not the language, then cursors should be fast because the first thing done in a cursor is to build a set based query to "fetch" from and then it just loops on the rows fed to it. But I'm all the time reading that cursors are not a good thing performancewise.

    Patrick,

    could you please give an example of what you are talking about?

    We keep trying to explain that RBAR is a concept of processing data from any source Row by Agonizing Row.

    You keep telling us that this is only a T-SQL thing, we keep posting back that we don't agree since we see RBAR code everywhere.

    What exactly is your example or personal situation that makes you so certain this is a T-SQL issue with MS SQL servers?

    Everytime I've done a loop in c for instance, I've been pretty satisfied with how it turned out. Whats your counterpoint with RBAR everywhere?

    edit: I'm genuinely interested here, I'm not the inventor of the RBAR thing, and I'm certainly not going to dictate its usage, but at the same time, if some fellow posts that SSMS is slow because its using RBAR (like I saw recently), I'm not going to necessarily feel like I've learned anything from reading that post. We name things and concepts so that we can share our recognition of their occurrences after all, right?

    This is a discussion about rbar vs set operations, how compiled c performs in a blank loop isn't relevant to the topic.

    If you feel c is faster than t-sql despite the bottle neck being the sql engine either way more power to you. That doesn't change that a set based solution will be faster than an rbar solution whether you use a c wrapper or do everything in t-sql.

  • ZZartin (12/10/2013)


    patrickmcginnis59 10839 (12/10/2013)


    PHYData DBA (12/10/2013)


    patrickmcginnis59 10839 (12/9/2013)


    Well thats what I was wondering, then and now. What makes RBAR so slow if its not the language? If its not the language, then cursors should be fast because the first thing done in a cursor is to build a set based query to "fetch" from and then it just loops on the rows fed to it. But I'm all the time reading that cursors are not a good thing performancewise.

    Patrick,

    could you please give an example of what you are talking about?

    We keep trying to explain that RBAR is a concept of processing data from any source Row by Agonizing Row.

    You keep telling us that this is only a T-SQL thing, we keep posting back that we don't agree since we see RBAR code everywhere.

    What exactly is your example or personal situation that makes you so certain this is a T-SQL issue with MS SQL servers?

    Everytime I've done a loop in c for instance, I've been pretty satisfied with how it turned out. Whats your counterpoint with RBAR everywhere?

    edit: I'm genuinely interested here, I'm not the inventor of the RBAR thing, and I'm certainly not going to dictate its usage, but at the same time, if some fellow posts that SSMS is slow because its using RBAR (like I saw recently), I'm not going to necessarily feel like I've learned anything from reading that post. We name things and concepts so that we can share our recognition of their occurrences after all, right?

    This is a discussion about rbar vs set operations, how compiled c performs in a blank loop isn't relevant to the topic.

    If you feel c is faster than t-sql despite the bottle neck being the sql engine either way more power to you. That doesn't change that a set based solution will be faster than an rbar solution whether you use a c wrapper or do everything in t-sql.

    Just to recap, the poster I was replying to included text that implied RBAR penalties was not limitted to T-SQL. Happy to hear some clarification on that!

  • patrickmcginnis59 10839 (12/10/2013)


    ZZartin (12/10/2013)


    patrickmcginnis59 10839 (12/10/2013)


    PHYData DBA (12/10/2013)


    patrickmcginnis59 10839 (12/9/2013)


    Well thats what I was wondering, then and now. What makes RBAR so slow if its not the language? If its not the language, then cursors should be fast because the first thing done in a cursor is to build a set based query to "fetch" from and then it just loops on the rows fed to it. But I'm all the time reading that cursors are not a good thing performancewise.

    Patrick,

    could you please give an example of what you are talking about?

    We keep trying to explain that RBAR is a concept of processing data from any source Row by Agonizing Row.

    You keep telling us that this is only a T-SQL thing, we keep posting back that we don't agree since we see RBAR code everywhere.

    What exactly is your example or personal situation that makes you so certain this is a T-SQL issue with MS SQL servers?

    Everytime I've done a loop in c for instance, I've been pretty satisfied with how it turned out. Whats your counterpoint with RBAR everywhere?

    edit: I'm genuinely interested here, I'm not the inventor of the RBAR thing, and I'm certainly not going to dictate its usage, but at the same time, if some fellow posts that SSMS is slow because its using RBAR (like I saw recently), I'm not going to necessarily feel like I've learned anything from reading that post. We name things and concepts so that we can share our recognition of their occurrences after all, right?

    This is a discussion about rbar vs set operations, how compiled c performs in a blank loop isn't relevant to the topic.

    If you feel c is faster than t-sql despite the bottle neck being the sql engine either way more power to you. That doesn't change that a set based solution will be faster than an rbar solution whether you use a c wrapper or do everything in t-sql.

    Just to recap, the poster I was replying to included text that implied RBAR penalties was not limitted to T-SQL. Happy to hear some clarification on that!

    That is the clarification. Rbar penalties have no special penalties using only tsql.

  • So here's my example of a poorly architected RBAR process. It's an import of external data to a OLTP system. It loads an average of 3K records at night and takes over 3 - 4 hours. Yes, you read that correctly. Here's what it does:

    For each record in the file, call a webservice to retrieve the most recent contact info from the corporate master record

    Look up the record from the file in the database table

    If it doesn't already exist in the system add it to the tables

    If the record is already in the system:

    Check to see if it's inactive, if so reactivate it

    Add the new balance information for all existing accounts

    Add any new contact information for all existing accounts

    Update existing contact information for all existing accounts

    It does the process just as I explained above. For each and every row, one by one. I have never seen anything so painful in my life; it is database abuse. The adds/changes are stored procs that accept as an input variable an account number and the new/changed field values from the file and that row and only that row is updated for each call. I could bulk insert, and MERGE this data in seconds. The webservice call stymies me a little bit, but I know I could run those through SSIS and shred back the XML to a staging table and then SET based handle the data adds and changes - again in SECONDS. And my company paid someone money for this program!

    MWise

  • ZZartin (12/10/2013)


    patrickmcginnis59 10839 (12/10/2013)


    ZZartin (12/10/2013)


    patrickmcginnis59 10839 (12/10/2013)


    PHYData DBA (12/10/2013)


    patrickmcginnis59 10839 (12/9/2013)


    Well thats what I was wondering, then and now. What makes RBAR so slow if its not the language? If its not the language, then cursors should be fast because the first thing done in a cursor is to build a set based query to "fetch" from and then it just loops on the rows fed to it. But I'm all the time reading that cursors are not a good thing performancewise.

    Patrick,

    could you please give an example of what you are talking about?

    We keep trying to explain that RBAR is a concept of processing data from any source Row by Agonizing Row.

    You keep telling us that this is only a T-SQL thing, we keep posting back that we don't agree since we see RBAR code everywhere.

    What exactly is your example or personal situation that makes you so certain this is a T-SQL issue with MS SQL servers?

    Everytime I've done a loop in c for instance, I've been pretty satisfied with how it turned out. Whats your counterpoint with RBAR everywhere?

    edit: I'm genuinely interested here, I'm not the inventor of the RBAR thing, and I'm certainly not going to dictate its usage, but at the same time, if some fellow posts that SSMS is slow because its using RBAR (like I saw recently), I'm not going to necessarily feel like I've learned anything from reading that post. We name things and concepts so that we can share our recognition of their occurrences after all, right?

    This is a discussion about rbar vs set operations, how compiled c performs in a blank loop isn't relevant to the topic.

    If you feel c is faster than t-sql despite the bottle neck being the sql engine either way more power to you. That doesn't change that a set based solution will be faster than an rbar solution whether you use a c wrapper or do everything in t-sql.

    Just to recap, the poster I was replying to included text that implied RBAR penalties was not limitted to T-SQL. Happy to hear some clarification on that!

    That is the clarification. Rbar penalties have no special penalties using only tsql.

    Ok so I went and looked for some RBAR stories out there, and came across this one from simple talk:

    https://www.simple-talk.com/sql/t-sql-programming/rbar--row-by-agonizing-row/

    I have to say, I have some beefs, heres one snippet:

    "RBAR is a consequence of coding in a strictly procedural way, rather than in a set-based way. It is different from poor coding; it is the result of adopting a mindset that one always has to tell the computer, step by step, how to do something rather than, in a set-based approach, merely specifying the result one is aiming for. With relational databases, one tells the database what one wants, not how to do it, row by agonising row."

    Ok this is fair enough in my mind, I've experienced the downside of "procedural programming" in T-SQL, and have hit performance penalties, and I get that you want to do your work in sets with SQL server.

    But then he describes as an example of "RBAR processing on the front-end" a process with an access front end that downloads entire tables, processes them in the access front end, then uploads them back.

    So I'm thinking, is this really RBAR, or is this just really wasteful programming period? And if its not RBAR, how many instances of simple inefficient code are we just calling RBAR? Again this reminds me of the fellow calling SSMS an example of RBAR.

    Anyways, I just thought I'd toss that out there 🙂

Viewing 15 posts - 76 through 90 (of 116 total)

You must be logged in to reply to this topic. Login to reply