Bit Flag or seperate columns

  • Hello

    I have a question regarding the use of bit flags. I have a dataset that looks like a bit flag no brainer, but when I went to our DBA to implement the schema, there was strong resistance to a single column with a bit flag in favor of multiple columns (one for each bit). What would you suggest given the information below?

    The dataset needs to answer 8 Y/N questions. My solution was something like.

    Col 1, Col n…, BitFlag

    The BitFlag column would be an int in the form of 00000000 with each bit representing an answer to each question. So for example if I wanted to get all values from the table where question 2 and 4 were yes (00001010) I would query it as follows.

    Select * from <SomeTable>

    Where (BitFlag & 10) <> 0

    The opposing suggestion was that there be 8 y/n columns as follows:

    Col 1, Col n…, Q1, … , Q8

    Where Q1-Q8 would only ever be 1 or 0 and would be indexed in some fashion.

    This seems to be cumbersome if I ever need to add Q9, Q10, …Qn. I was not given a satisfactory answer as to why the single column schema was a bad idea. Could someone shed some light on this situation?

    Thanks,

    Steve

    Steve

  • This is a bit of a discussion that is driven by preferences of the DBA (or the developper).

    Some reasons not to use bitmasks :

    1. An integer column can only hold that many values (uuuhm, 32 bits, I guess). You could switch to bigint (64 bits).

    2. You can only store Yes/No type of information. No possibility to store other 'values'.

    3. The database is not really readable. It is easier to see in your query 'WHERE Answer2 = true and Answer4 = true' instead of 'WHERE (Answers & 10) <> 0

    Cons 1 and 2 can be taken care of by using a char column. You can store up to approx. 8000 answers and up to 256 possible answers.

  • I do realize that there is an "upper bound" on the number of questions that I can handle with an int, but that can be easily addressed, as you suggested, by changing the data type of the column. But I disagree that Y/N data is the only answers that you could extrapolate from a bit flagged field. For example if I wanted to say that Q1 was answered by bits 1 and 2 I could have 4 possible answers to Q1.

    00 = NO

    01 = YES

    10 = Maybe

    11 = Not at all.

    I do agree that the database does become unreadable, but that is going to be the case with bit math for a good majority of the population. It seems to me that as far as speed, query simplicity and scalability(adding more questions) that a single bit flagged field with the appropriate data type is the best solution. I would love to be proven wrong, but I just need a good answer to bring me into the light.

    Thanks

    Steve

  • I was not saying that using a bitmasked field is bad...

    I do believe it has some uses, on the other hand, there are some drawbacks. Just tried to list some.

    I have some others

    1. As for performance, I'm not sure that a bitmasked field is always faster.

    It is impossible to have an index on one question (but that won't give you an edge if you only have Y/N type questions).

    2. You always have to fetch the complete field, even if you only need a single answer. (Negligable with small question sets.)

    3. Grouping and ordering can become cumbersome.

    4. You have 'hardcoded' the logic inside your table.

  • IMHO, best practice if you want an extensible system, is to have one base table, say with all the persons filling in the questionnaire; one table with all the questions in the questionnaire and a third table linking both tables indicating the answers. This is by all means the most flexible and extensible design. Performance however...

    And now, you'll certainly will start flaming me...

    Like I said before, as with a lot of these 'strategy' discussions, there is no right or wrong. It's all a matter of opinion, the problem you are trying to solve, ...(and maybe a pain-in-the-ass DBA that has his own ideas ...)

  • I suggest using neither bitmap nor separate columns. You admit that this may need to be extended...

    http://qa.sqlservercentral.com/forum/link.asp?TOPIC_ID=16423

    --Jonathan

    Edited by - Jonathan on 10/27/2003 09:14:04 AM



    --Jonathan

  • So what I am understanding, at least so far as you have experienced, is that this is a stylistic matter that would probably need to go the rounds in a series in test cases to see which would yield the best performance. I do so appreciate the frankness of your last response. I thought I would bust a gut laughing.

    Thanks

    Steve

  • I do like the alternate suggestion NPeeters made and I will look into a schema of this type.

    Jonathan, could you repost that link. It does not seem to work.

    Steve

  • Ok, but what if later you find you need more than 8 questions?

    Why not make it relational. Have a table called "Questions" and a table called "Answers". Maybe link a Users table to the Answers so you can keep track of what answers each user has provided. This way, your app is always flexible and can easily adopt to change.

    Schema:

    Questions

    QuestionID QuestionType (bit, text, etc) Question

    Answers

    UserID QuestionID Answer(varchar(50)? to handle bit or any datatype)

  • Thanks,

    I believe that Jonathan and NPeeters suggested that very same thing.

    Edited by - kepr00 on 10/28/2003 07:20:49 AM

    Steve

  • Heh, sorry, must have missed it...still waking up here...

  • How about both? Use a bit mask and add a computed column for the answer to each question.

    Randy
    Helpdesk: Perhaps Im not the only one that does not know what you are doing. 😉

  • The relational tables is definately a better solution. I had to build a quiz app and keep track of who answered what (the quiz was related to a contest the company was holding for customers)

    What I had was a users table, a questions table, an answers table, a usersAnswers table, and a Quiz table. The idea was that there would be reuse of the database and quiz app for future quizzes.

    So, the Quiz table was the master table.

    Questions had a many to one relationship with the Quiz table.

    Answers had a many to many relationship to the Questions table.

    The users table was a "control" table, for lack of a better term that just listed who had taken the test and assigned them an id.

    The userAnswers table was a one to many relationship between users and Answers.

  • I have used bitmaps in a few cases, but like most of the comments here, the relational approach is the way to go. If I *know* that the number of bits won't change, I'll tend to use bitmaps. For instance:

    A.B.C.D (an IP address)

    But that's well-defined (at least until IPv6 becomes more popular). If I don't have such a well-defined set, then I won't use bitmaps. Having to revisit the issue because I limited my design and we now need to expand is never fun.

    K. Brian Kelley

    http://www.truthsolutions.com/

    Author: Start to Finish Guide to SQL Server Performance Monitoring

    http://www.netimpress.com/

    K. Brian Kelley
    @kbriankelley

  • quote:


    I have used bitmaps in a few cases, but like most of the comments here, the relational approach is the way to go. If I *know* that the number of bits won't change, I'll tend to use bitmaps. For instance:

    A.B.C.D (an IP address)

    But that's well-defined (at least until IPv6 becomes more popular). If I don't have such a well-defined set, then I won't use bitmaps. Having to revisit the issue because I limited my design and we now need to expand is never fun.

    K. Brian Kelley

    http://www.truthsolutions.com/


    Just use four tinyint columns for IP addresses.

    --Jonathan



    --Jonathan

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply