"Best" method for storing status values

  • I'm looking for the "best" (here I mean the most efficient) method for storing status values in big tables.

    The problem is that I have some big tables (a few millions of records) and I have to store 14 different status values. Each value could have up to 12 distinct values.

    A tipical query would look like

    select count(*) from MyTable where Status9 = 4

    or

    update MyTable set Status11 = 9 where Status8 = 0 and Status11 = 3

    The only problem is that the optimiser wouldn't use any index on those status fields because the selectivity is very poor (very few distinct values compaired to the total number of rows)

    Do you have any advice?



    Bye
    Gabor

  • Have you tried using the "Index Tuning Wizard" in Query Ananlyzer?  It does do some remarkable things if you let it.  And you would be surprised how much a single or two column index works even when the "selectivity is very poor" when one of thekeys is the primary key.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • If there are certain status columns that are used in all the queries, try to make a compound index on all the status columns, placing the frequently used columns first. If there are some status columns that are frequently used together, try to make another compound index on these columns. Try to find the combinations of columns that have a better selectivity (I hope that you aren't updating/selecting hundreds of thousands rows, because in this case, an index cannot help you). Of course, don't make too many indexes, because keeping them up to date may slow down the updates too much.

    If the Query Optimizer still doesn't use these indexes, then it seems that in your case indexes are not helpful (don't use index hints, unless you are absolutely sure that the optimizer is wrong).

    There may be some other way to combine the status values in a single computed column (numeric or character) and make an index on this column. Then use some method to filter the rows using this column before applying the real conditions. But it all depends on how this status columns are used in the queries; I cannot give a more pertinent advice without knowing the semnification of the status columns and their exact usage.

    Razvan

  • Thanks guys,

    Basically what I would need a help for is a kind of design hint.

    I'm just thinking about making separate table just for thoses statuses which would act as a kind of "index table".

    I've of cours tried out a lot of index combination but the performance is still poor because the most of the query is selecting based on one single status value like:

    select * from MyTable where Status9 = 4

    And here there is no way to use any index. It will be a table scan every time



    Bye
    Gabor

  • I'm not going to ask the purpose of all those statuses, but I wonder if not the real problem lies in the design of the different statuses themselves.. (I'm trying to get a grip of what they could possibly mean)

    In any case, if my math isn't up the walls (which it may very well be ), you say that you have 14 different statuses that each can take 12 different values (or "states")

    This implies that each row could have any "full status (= all together) of 14 to the 12th power..?

    or.. 56 693 912 375 296 different variations.

    Does anything need that many levels of status is a question that comes to mind... 

    The scope seems overwhelming...

     

    /Kenneth

  • Kenneth,

    You could consider those statuses as a kind of workflow. I will give you some examples:

    St_megr_1 could have 3 statuses: 0=Initial state, 1=Order received, 2 Order accepted

    St_megr_2 could have 3 statuses: 0=Initial state, 1=Old customer with data change request, 2=New customer with data chenge request

    St_megr_5 could have 4 statuses: 0=Initial state, 1=(If St_megr_1=2), 2=(If St_megr_1=2 AND the manager has agreed), 3=(If St_megr_5=2 AND pro-forma invoice sent)

    St_megr_6 could have 2 statuses: 0=Initial state, 1=(If St_megr_2=1 OR St_megr_2=2)

    St_megr_9 could have 2 statuses: 0=Initial state, 1=(If St_megr_6=1 AND the change request has been accepted by the manager)

    and so on.

    As you can see not all thoses "statuses" have many distinct values and not all the statuses are in relation with each other.

    I could replace some of thoses statuses by a datetime field like for example for St_megr_1 but this is not suitable for St_megr_2.

    What I have to tell is that this is an actual business logic of my customer with a quite old database design. I have to redesign it with performance in view



    Bye
    Gabor

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply