De-normalize or index?

  • I have a set of forms populated by queries whose WHERE clauses look like this:

    WHERE (dbo.TableOfSystems.SystemAutoID IS NOT NULL) OR

    (dbo.TableOfSeriess.SeriesAutoID IS NOT NULL) OR

    (dbo.TableOfStages.StageAutoID IS NOT NULL) OR

    (dbo.TableOfSubStages.SubStageAutoID IS NOT NULL) OR

    (dbo.TableOfLithographicUnits.LithographicUnitAutoID IS NOT NULL) OR

    (dbo.TableOfLithographicSubUnits.LithographicSubUnitAutoID IS NOT NULL) OR

    (dbo.TableOfZones.ZoneAutoID IS NOT NULL)

    There are several such queries and selection criteria are subsequently applied to the result set by the user, but I wonder, might it be worthwhile for the initial load to create separate Boolean fields whose values corresponds to the results of such queries? That is, one Boolean field set to true or false (probably by triggers, for maximum reliability) that would be true if any one of these fields had a value and false if they were all null. It seems to me that it might be faster than scanning such a set of fields, even if they are properly indexed, since I don't care what's in them at this point, just whether or not they are null.

  • On safer side i would say create some relevant indexes otherwise you can De-normalize some tables and then put some indexes.

    -------Bhuvnesh----------
    I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

  • Bhuvnesh (12/14/2010)


    On safer side i would say create some relevant indexes otherwise you can De-normalize some tables and then put some indexes.

    I have such indexes now. My question was whether this approach might be improved upon by creating an extra True/False field that would replace them. I guess it comes down to whether a once-through scan of the entire table on a single Boolean value would be a better performer than multiple index seeks, albeit with fewer records, since those without a value in the key fields would presumably be excluded from the index. And I'm not even sure if SQL Server will use indexes, or if it might just scan the index fields directly.

    I suppose that the thing to do is try both ways and look at the query plans to be sure. I was hoping that someone here might have some experience with such an approach and would be able to tell me if it's a good idea at all, before I go to the work of creating all this, maybe only to discover that it's pointless.

  • I wonder, might it be worthwhile for the initial load to create separate Boolean fields whose values corresponds to the results of such queries? That is, one Boolean field set to true or false (probably by triggers, for maximum reliability) that would be true if any one of these fields had a value and false if they were all null. It seems to me that it might be faster than scanning such a set of fields, even if they are properly indexed, since I don't care what's in them at this point, just whether or not they are null.

    First, NULL values in columns are *already* represented by an internal bitmap.

    (http://blogs.msdn.com/b/sqlserverstorageengine/archive/2006/06/29/650349.aspx)

    I understand you are thinking about creating separate columns (not fields when talking about a relational database). You could indeed create such columns and build an index over them, but it seems like a lot of work for limited usefulness. In fact, if you are pulling any data from a significant number of rows from those tables in your WHERE clause, your index probably won't get used anyway. Lookups from an index to a base table can be expensive. To avoid lookups, your best bets are proper covering indexes and indexed views.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • pdanes2 (12/9/2010)


    De-normalize or index?

    Question could be translated like "be healthy or take a fast speed train?" meaning, those options should never be part of the same question.

    Normalization/De-Normalization is related to the underlying data model then physically implemented.

    Indexes are related to the reporting strategy.

    Bottom line is, you must have a properly designed database in the sense of the level of normalization needed for the specific application and you must have a properly designed indexing strategy.

    Hope this helps.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • create proper indexes on the columns which are included in where condition. Also place proper joins by normalizing tables

  • harendra.singh (12/18/2010)


    create proper indexes on the columns which are included in where condition. Also place proper joins by normalizing tables

    well... this one can be translated as: "do it right, dont' to it wrong" 😀

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • Dixie, Paul, thanks for the tips, at my current skill level with SQL Server, anything is welcome. It sounds as though an indexed view might be the best bet here. I was trying to speed up a fixed set of queries that read like (air code):

    SELECT BaseTable.Col1, BaseTable.Col2, BaseTable.Col3, AuxTableA.ColA, AuxTableB.ColB, AuxTableC.ColC

    FROM BaseTable

    INNER JOIN AuxTableA ON BaseTable.ColAIDX = AuxTableA.ColAIDX

    INNER JOIN AuxTableB ON BaseTable.ColBIDX = AuxTableB.ColBIDX

    INNER JOIN AuxTableC ON BaseTable.ColCIDX = AuxTableC.ColCIDX

    WHERE BaseTable.ColAIDX Is Not Null

    OR BaseTable.ColBIDX Is Not Null

    OR BaseTable.ColCIDX Is Not Null

    My thought was that the single Boolean would be faster, at the expense of an un-normal additional column. It seemed to me that since I didn't care what was in any of these columns, only that at least one of them was not null, the initial selection of which rows to pull from the base table might be improved. But an indexed view already has this work done ahead of time, instead of doing all the compares on the individual columns at run time, is that correct?

  • pdanes2 (12/18/2010)


    Dixie, Paul, thanks for the tips, at my current skill level with SQL Server, anything is welcome. It sounds as though an indexed view might be the best bet here. I was trying to speed up a fixed set of queries that read like (air code):

    SELECT BaseTable.Col1, BaseTable.Col2, BaseTable.Col3, AuxTableA.ColA, AuxTableB.ColB, AuxTableC.ColC

    FROM BaseTable

    INNER JOIN AuxTableA ON BaseTable.ColAIDX = AuxTableA.ColAIDX

    INNER JOIN AuxTableB ON BaseTable.ColBIDX = AuxTableB.ColBIDX

    INNER JOIN AuxTableC ON BaseTable.ColCIDX = AuxTableC.ColCIDX

    WHERE BaseTable.ColAIDX Is Not Null

    OR BaseTable.ColBIDX Is Not Null

    OR BaseTable.ColCIDX Is Not Null

    My thought was that the single Boolean would be faster, at the expense of an un-normal additional column. It seemed to me that since I didn't care what was in any of these columns, only that at least one of them was not null, the initial selection of which rows to pull from the base table might be improved. But an indexed view already has this work done ahead of time, instead of doing all the compares on the individual columns at run time, is that correct?

    I only looked at this briefly but I'd say the WHERE clause is totally unnecessary. Unless someone messed with the server-wide settings, NULLs won't join anyway.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • I only looked at this briefly but I'd say the WHERE clause is totally unnecessary. Unless someone messed with the server-wide settings, NULLs won't join anyway.

    Hmm, yes, I should have realized that. But it still comes down to whether it would be faster for the SQL engine to look at one Boolean value instead of all the individual index columns for the initial selection from the base table, and there doesn't seem to be any certainty on that. Or if anyone does know for sure, they haven't said so

  • By default, put indexes on your FK columns (so they match the PK they refer to) and have the FK declared with check.

    Grant published a little but great article on that :http://scarydba.com/2010/11/22/do-foreign-key-constraints-help-performance/

    Only remove them, if you can prove they hurt your performance.

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • pdanes2 (12/19/2010)


    I only looked at this briefly but I'd say the WHERE clause is totally unnecessary. Unless someone messed with the server-wide settings, NULLs won't join anyway.

    Hmm, yes, I should have realized that. But it still comes down to whether it would be faster for the SQL engine to look at one Boolean value instead of all the individual index columns for the initial selection from the base table, and there doesn't seem to be any certainty on that. Or if anyone does know for sure, they haven't said so

    I guess I don't understand why that's still a concern here considering that the test-for-null criteria just isn't needed here.

    So far as which would be faster... a single Boolean column or a 3 column criteria goes, you're no newbie. Do what I would do... set up a repeatable test and report back here. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • pdanes2 (12/19/2010)


    I only looked at this briefly but I'd say the WHERE clause is totally unnecessary. Unless someone messed with the server-wide settings, NULLs won't join anyway.

    Hmm, yes, I should have realized that. But it still comes down to whether it would be faster for the SQL engine to look at one Boolean value instead of all the individual index columns for the initial selection from the base table, and there doesn't seem to be any certainty on that. Or if anyone does know for sure, they haven't said so

    The best answer you are likely to get is "It depends." 😉

    I still think an indexed view containing all the columns needed to satisfy your SELECT query is going to be optimum in terms of performance, but you will pay a price in maintenance. However, I'm putting that opinion out there without any data or testing, so it's just speculation.

    I have never heard of anyone using the particular technique you are proposing. However, it seems in principle to be a common strategy of doing work up front to save time on the SELECT query later. In this respect it seems no different than building summary queries, filtered indexes, indexed views, etc. The ultimate answer is to try it and test against your other options. If you do, please share your results with us. Best of luck

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • The Dixie Flatline (12/19/2010)


    The ultimate answer is to try it and test against your other options. If you do, please share your results with us. Best of luck

    Heh... you and I posted at the same time without knowledge of each other's intent and came up with the same answer. Ya just gotta know that there's something right about the answer. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Okay, ganging up up me... I'll try both methods and post the plans and results back here. It may be a couple days, though, I've got a monumental headache going in another thread, that's got me on the verge of committing serious mayhem in random directions. Once I get that sorted out, I'll get back to this.

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply