Columns Better Than Rows For Data Warehouses

  • I just read an article in Information Week "Columns Better Than Rows For Data Warehouses, Says Vertica Systems "

    http://www.informationweek.com/news/showArticle.jhtml?articleID=206800662

    of a better and faster way of storing data for DataWarehouses.

    Being a DBA and developer this caught my eye. I found the article interesting but haven't really delved into Vertica systems and exactly how this "Columns" system is accomplished. According to the article this Vetica system is a lot faster than any other row based DB system.

    Just figured I'd put it out there for all of you to read (If you haven't already) and of course welcome any comments and discussions.

    What does everyone think?

  • I've read about it and it just makes sense to me. The whole idea of set based processing is to make the paradigm shift from thinking about what to do to a row to what to do to a column. I'd love to get my hands on it to test it with a couple of my more notorious "million row" tests. Forget about warehousing... if it can pass my tests, it'll handle just about anything.

    It's funny though... I hope Vertica makes it and doesn't go the way of the Beta VCR. Part of the reason for Oracle being so popular is that it readily supports row based thinking without much of a performance penalty for it (although set based still runs faster even on Oracle). I wonder if people can actually make the paradigm shift towards thinking in columns instead of thinking in rows. Considering all the folks that think writing a cursor is ok if you can't think of a set based solution instead of going the extra mile to think of the set based solution, and the fact that set based solutions are based on columnar processing, I'm thinking that Vertica may have to offer a lot of free training to get their product out on a wide spread basis.

    And, the real fact of the matter is, GUI's are row based and always will be. The glitter of the GUI will always be key. Us batch processing fools always end up taking the back seat to the GUI... well... until there's a performance problem 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • I have seen some blog postings as well from Michael Stonebreaker and while it does seem to make sense, the people posting have a product to sell and that has my cynical side up.

    So many people hype a technology (including Microsoft) to sell something, regardless of whether it's a better solution for your application. That doesn't mean they are wrong, but you have to wonder.

    My view is that there's a place for a database like this in certain places, but as far as displacing relational databases for this type of work, I still think the relational world will dominate. If for no other reason that there are a vast number of people with the skills in the relational world and replacing them is not easy.

    A few references:

    http://www.databasecolumn.com/2007/09/stonebraker-comment-response.html

    http://en.wikipedia.org/wiki/Column-oriented_DBMS

    http://www.databasecolumn.com/2007/10/cpu-trends-like-disk-trends.html

    http://www.databasecolumn.com/2008/02/responding-to-monash-2.html

    http://www.databasecolumn.com/2008/02/insert-performance.html

  • I'm cynical as well... but the very thought of making a database engine that truly responds to the columnar nature of an RDBMS is thought provoking and intriging, at the very least.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Vertica is the commercialized evolution of the c-store project http://db.csail.mit.edu/projects/cstore/, so if one was interested in playing around with it, set up a Linux VM and get a freely-available compiler (forget which one, but it's a few versions back), and away you go.

    MonetDB is also another free version one can play with.

    ParAccel is another commercial offering, headed up in part by one of the early Oracle guys.

    As far as this being something other than relational, I think that's just the "inner marketer" at Vertica speaking. As near as I can tell, these DBMSs are as faithful an implementation of relational ideals as any other SQL DBMS -- which is to say, close enough that people use the word. They're really little more than a different physical manifestation of the model which is tuned for a particular type of use, much in the same way that StreamBase is.

    It's definitely interesting, though, because it shows that with effort, we can gain a good separation of the logical model from its physical implementation. I hope all of the companies in the space do well.

    TroyK

  • Column storage has nothing to do with "displacing" the relational model. Column store is a model for *storage* within the DBMS. The tabular view of data is still what end users see. The column store paradigm has been around for years and is already used by several SQL DBMSs (Sybase IQ, SAND)

    According to the vendor, Vertica uses SQL just like its rival products from Oracle and Microsoft.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply