Wikipedia comment about denormalization

  • Hi,

    I came across this in Wikipedia, http://en.wikipedia.org/wiki/Database_normalization#Denormalization

    It has never been proven that this denormalization itself provides any increase in performance, or if the concurrent removal of data constraints is what increases the performance. The need for denormalization has waned as computers and RDBMS software have become more powerful.

    What do you make of this? I am not experienced when it comes to design OLAP database but I am wondering, if it has never been proven why all OLAP database I have seen were denormalized and why people are mostly preaching 'star schema' for DW design?

    Thoughts?

  • got some comments on this on dbforums

    http://www.dbforums.com/showthread.php?p=6333362#post6333362

  • Its Wikipedia. People can post their opinions as facts anytime they want to.

    It is however an absolute fact that Denormalization can, in some cases, dramatically improve performance and that has nothing to do with removing constraints.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Denormalizing the database will improve query performance simply because it has less number of joins. The more the joins, worse is the query performance. That's why most of the DW designs prefer denormalization.

    Warm Regards,
    Neel aka Vijay.

  • I've found that there's a "tipping" point between the performance gains realized in denormalizing information and joining for the same information. If the denormalization results in millions of rows, the joins (provided they're properly written and indexed) will frequently outpace denormalization for performance.

    The real answer is (like anything else), "It depends".

    And, like Barry suggested, WikiPedia, although very good, is NOT the ultimate authority... many people have expressed opinions on WikiPedia without the necessary testing to support their claims and, as a result, are sometimes incorrect.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • if you read the details of the big database performance test, (can't remember the name) they almost always use hundreds of disks. if you can get the money, you can do the same and divide up all your tables and indexes on their own spindles and get good performance.

    in most cases where tables and indexes have to share the same disk you have to denormalize depending on your environment to get performance.

    and we do have a table where i work with 80 columns where we get replication problems sometimes. it slows down dramatically just for that one table that has the most columns of any table we have that we replicate. solution is to snapshot it and reinitialize it. usually works and sometimes we have to do it a few times.

  • SQL Noob (8/15/2008)


    in most cases where tables and indexes have to share the same disk you have to denormalize depending on your environment to get performance.

    I'm going to disagree with you there. In some cases denormalisation is required for performance. Some, not most. In most cases good indexing and well-written queries are more than adequate for good performance.

    That said, depends on your environment. In a data-warehouse, aggregated reporting environment, sure denormalisation, especially pre-aggregation makes sense. That's what a cube is. In an olap environment, small queries, small numbers of rows. Shouldn't be necessary at all.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • There are a small number of expert database designers who are capable of understanding the significant disadvantages and trade-offs of a denormalized design and using it in selected cases where it may benefit the application.

    However, the main use of denormalization is to provide an excuse for incompetent database designers who do not understand normalization. As in saying this when asked why they their design is not normalized, “I denormalized the tables for performance.”

  • Likewise, there are an even smaller number of experts willing to acknowledge the advantages of denormalization simply because it's easier (or can be,) and can therefore sometimes be the correct business strategy regardless of it's conspicuous lack of purity in the eyes of design zealots.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply