Database and SetBased Thearoy

  • Hi Experts,

     

    I read that databases are designed based on Set Based thearoy.

    But really i dont know what is mean by Set Based Thearoy ? What is the advantage over it ? Where exactly it is sittting in database design ? Why databases are not designed on row by row processing method ?

    I need some good real time example for using Set Based Thearoy in Databases and Row by Row processing methods and its advantages,disadvantages.

     

    Say For example , i am execution 'select * from emp' in a database.

    If i execute this by setbased thearoy ,then ,what is the internal processing steps and methods ? what is the advantage ? what is the disadvantage ?

    If i execute this by row by row thearoy ,then ,what is the internal processing steps and methods ?what is the advantage ? what is the disadvantage ?

     

    Why can't we design our db on row by row procession method ?

     

    Can anybody explain me with example ?

     

    Regards

    Karthik

     

     

     

     

    karthik

  • I recommend you start by reading this book:

    http://www.amazon.com/Introduction-Database-Systems-Eighth/dp/0321197844

    The straightforward answer to your questions is that SQL Server is designed and optimized to work well with set-based SQL logic (more accurately, SQL uses a "multi-set" or "bag" model rather than a set-based model).

    You'll find that SQL as a language is poorly equipped to handle iterative-style row-by-row processing. Most of the time you'll also find that SQL Server can execute set-based query logic much faster than doing the same thing on a row-by-row basis.

     

  • The "key" to good, high performance "Set Based" code, is to forget everything you know about programming for row solutions.  Instead, think about what you want to do to an entire column in a "set" of data.  Once you've made that key paradigm shift, the rest will come easy and continue to get easier.  I'll say it again, stop thinking about "rows"... think about "columns in a set of rows" instead.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff,

     

    Can you tell me why database is designed based on 'Set Based' instead of 'Row by Row'. What is the advantage over it ? While designing server why cant we apply row by row processing method ? 

    Regards

    Karthik

    karthik

  • Hi Karthik,

    By row-by-row I assume you mean some kind of predefined ordering, like what sequential files have. But many times data does not require such an ordering at all. Do you need to know the order of the US state names for example? Also, how do you order names? I mean in a language independent way. The query results would also not always need ordering. The ordering is concerning the physical layout of the data, which could be implementation specific, and is most of the time not part of the data itself.

    Regards,

    Andras


    Andras Belokosztolszki, MCPD, PhD
    GoldenGate Software

  • The "set based" model really means the Relational Model, on which SQL is based to a certain extent. It separates the data model from its implementation. Independence from ordering and other navigational structures, better support for inferential queries, data integrity and non-redundancy are some of the benefits of this approach.

    SQL doesn't realise these benefits fully because it only has certain features of the relational model while others are left out or incomplete. However, the underlying ideas are still very relevant and you should certainly take the time to study and understand them properly. I suggest you study a book or take a course to help you understand the basics before you do any important database work.

  • Karthik,

    Yes... as David suggested, RDBMS's are designed for "set based" use.  It's the nature of the beast.  By definition, each column should only contain one type of attribute (LastName, for example).  And, all queries are inherently based on columns of information.  For example, the following makes no reference to anything resembling a Row Number...

    SELECT FirstName, LastName, PayRate

    FROM sometable

    WHERE PayRate >= 10

    Notice... it selects columns of info... it filters by saying the value in some column must meet some criteria... and it returns a "set" of rows.  'Tis the nature of SQL.  Heck, SQL Server didn't even have CURSORs until Rev 6.5 (if memory serves correctly) because it was never meant for data to be processed one row at a time (row-by-row).

    What is the advantage of "Set Based" over RBAR (pronounced "ree-bar" and is a "Modenism" for "Row By Agonizing Row")?  Sheer, blinding speed.  What good is data if you can't get to what you want quickly?  The optimizer is setup for setbased processing just to enable this type of speed.  When someone tries to work using RBAR, they override the nature of the optimizer and it just slows things down.  If you look at a cursor or While Loop using the Estimated or Actual Execution Plans, you'll find very small lines connected everything.  That means that only one line is considered at a time instead of a plan for the "big picture".

    Also, just because a query DOESN'T have a Cursor or While loop, doesn't necessarily mean that it's "set based".  There's a "gray" area in the form of "Correlated Sub-Queries" (a sub-query that refers to rows outside itself)... basically, they do a separate SELECT for every row in some other table.  Depending on what you want to do, they can be thousands of times worse for performance than a cursor or a While Loop.  They're not all bad, though... it depends on what you're trying to do and how many rows you are processing (notice I didn't say "how many rows are in the table").  Either way, it's still RBAR.

    The bottom line is that SQL was build on the premises that David suggested... relationships between data.  In order to do that, it had to be done by column.  Writing queries that process by row violates the very fabric of that principle and slows things down.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply