Table - More Rows or More Columns?

  • Hello all,

    I ran into a little debate with myself as to what would be better in terms of performance and design ethics(?).

    I am working on an application that will allow creation of different reports. Each report has a predefined amount of fields and field names, however, only some of the fields will repeat from report to report.

    Initially, I had the application working this way:

    Table design-

    [RptNo | Field | Value]

    Hence, each field goes to it's own row.

    The other design thought up works this way:

    Table design-

    [RptNo | Rpt1Field1 | Rpt1Field2 | Rpt1Field3 ... Rpt1FieldN | Rpt2Field1 | Rpt2Field2 | Rpt2Field3 ... RptNFieldN]

    On this design, only the non-repeating fields from each report would be created as columns.

    As you can see, one contains more rows, and the other contains more columns. What are the recommendations?

    PS. The report fields are 95% numbers and the other 5% are text fields.

    Thanks in advanced for reading 😉

  • I would definitely go with your first option. Even if the second option has faster performance, I just see a maintenance nightmare as reports are modified or new reports are added. With the first option, all you need to do is add some records. Otherwise you have to modify the table design every time. You will then need to change views, stored procedures, etc., to account for the new fields which is unnecessary under the first option.

    Jason

  • "Whenever I see a table with repeating column names appended with numbers, I cringe in horror."

    http://www.simple-talk.com/sql/database-administration/ten-common-database-design-mistakes/



    Clear Sky SQL
    My Blog[/url]

  • 😀 Those repeating column names were simply for demonstration purposes. The actual column names do not contain the same names at all.

    Thanks for the great article though, will be helpful for future table designs 🙂

  • This is a difficult one and, to be honest, I'm not overly fond of either.

    The first, often called Entity-Attribute-Value is not a particularly good design most of the time. It's hard to enforce any form of integrity when all sorts of data is in one column.

    The second will be a maintenance nightmare and is going to result in huge rows and inefficient queries.

    Maybe you could describe what you're doing a bit more. There might be another way.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Hi GilaMonster, the post before your edit regarding First Normal Form was interesting. I read about it a bit. Anywho, what I am doing is, imo, almost clearly explained in the first post.

    I have an application that allows users to input data for various reports - inspection reports, if you must. Now, among the different reports there will be many fields that repeat. For example, the Lot Number, certain Dates fields, Warehouse field, etc.

    Besides those few "header" type fields that repeat, there will be some in the report detail itself, for example a box number. The reason for creating one giant table with all report data is so that the fields can be reused from report to report.

    GilaMonster (12/18/2009)


    ...The first, often called Entity-Attribute-Value is not a particularly good design most of the time. It's hard to enforce any form of integrity when all sorts of data is in one column...

    Because of the way that the application has to "temporarily" store data before being sent to SQL, the data integrity has to be enforced from the application anyway. - I am assuming "enforce any form of integrity" means forcing the user to use the correct type of data (eg, numbers for numbers).

    Any recommendations are very welcome.

  • jnaranjo86 (12/18/2009)


    Anywho, what I am doing is, imo, almost clearly explained in the first post.

    If that's clearly explained, I don't want to see what you call a 'brief summary' ;-). I'm trying to see if there's a better design, that's why I'm asking for more info.

    I have an application that allows users to input data for various reports - inspection reports, if you must. Now, among the different reports there will be many fields that repeat. For example, the Lot Number, certain Dates fields, Warehouse field, etc.

    Besides those few "header" type fields that repeat, there will be some in the report detail itself, for example a box number. The reason for creating one giant table with all report data is so that the fields can be reused from report to report.

    So where is the data for the reports originally coming from? Why do you need separate tables to store the report data?

    Because of the way that the application has to "temporarily" store data before being sent to SQL, the data integrity has to be enforced from the application anyway. - I am assuming "enforce any form of integrity" means forcing the user to use the correct type of data (eg, numbers for numbers).

    Yup, that's what I meant, data types plus domain constraints (dates cannot be before X, amounts must be >0, etc)

    No offence intended, but almost every time I've heard the phrase "data integrity is enforced from the application" it's usually followed (days or weeks later) by "Why is there a string in a column that should be numeric?" or similar.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (12/18/2009)


    jnaranjo86 (12/18/2009)


    Anywho, what I am doing is, imo, almost clearly explained in the first post.

    If that's clearly explained, I don't want to see what you call a 'brief summary' ;-). I'm trying to see if there's a better design, that's why I'm asking for more info.

    A brief summary would be like this: "I've got an application that uses a SQL table" - lol, no, I'm just kidding. I guess I am not seeing exactly what you are trying to figure out 🙁 - Please explain a bit more?

    As far as more details for the reports, here you are:

  • Data such as Label information, Supplier ID, Arrival Site, Date, Importer will be in every report and will be filled in by the application
  • The user will have to fill in text, numbers and select from a multiple choice list
  • This data will be shown on a webpage in the exact format it is input from the application
  • I think that's about it, any questions, please ask.

    GilaMonster (12/18/2009)


    I have an application that allows users to input data for various reports - inspection reports, if you must. Now, among the different reports there will be many fields that repeat. For example, the Lot Number, certain Dates fields, Warehouse field, etc.

    Besides those few "header" type fields that repeat, there will be some in the report detail itself, for example a box number. The reason for creating one giant table with all report data is so that the fields can be reused from report to report.

    So where is the data for the reports originally coming from? Why do you need separate tables to store the report data?

    The data comes from the users. The user is a QA inspector that will jot down information on a paper and then input those into the application. The reason for the paper is because they have to climb on pallets, open product boxes, etc. - in other words, be very mobile.

    As far as separate tables, I haven't seen the need for more than one.

    GilaMonster (12/18/2009)


    Because of the way that the application has to "temporarily" store data before being sent to SQL, the data integrity has to be enforced from the application anyway. - I am assuming "enforce any form of integrity" means forcing the user to use the correct type of data (eg, numbers for numbers).

    Yup, that's what I meant, data types plus domain constraints (dates cannot be before X, amounts must be >0, etc)

    No offence intended, but almost every time I've heard the phrase "data integrity is enforced from the application" it's usually followed (days or weeks later) by "Why is there a string in a column that should be numeric?" or similar.

    No offense taken, it's constructive criticism. Though I know what you mean, the data is simply for reporting purposes and the application itself is very data-type-proof. Say, if the user chooses to write text in a number column, nothing will happen and that's his/her choice.

    Mind you, it's an in-house application - I work for the company that will be using the software - this allows me to be a little bit more flexible with these things (imo). If I were selling this to a client then that would be another thing.

  • I'm trying to figure out if there's a better database design here, something other than EAV or 3000 columns in a table. On order to do that, I need to have at least an idea where the data comes from, what this actually is and how it'll be used.

    Will the data be stored for long periods after it's entered or is it enter, print report, delete?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Why are you so intent on storing all the data for all the reports in a single table ?

    Why not a table per report ?



    Clear Sky SQL
    My Blog[/url]

  • GilaMonster (12/19/2009)


    I'm trying to figure out if there's a better database design here, something other than EAV or 3000 columns in a table. On order to do that, I need to have at least an idea where the data comes from, what this actually is and how it'll be used.

    Ok, here's the full scoop then.

    ProductA comes in containers.

    Each container has 20 pallets.

    Each pallet has 75 boxes.

    Each box (depending on the fruit size) has 4 to 12 fruits.

    All boxes can be the exact same fruit size, or it can have multiple sizes, but each box always has the same fruit size.

    Out of those (20 pallets * 75 boxes ea.) = 1500 boxes, 10 boxes are randomly picked out for quality inspection.

    Different inspection parameters and data is taken from each from in those 10 boxes. These parameters and data is what makes up the "Fields" for the report.

    All the data for each box will be stored ONE record with the 3000 fields design, or 100 records with the EAV design.

    The data for each box consists of the total number of fruit that matched the evaluated parameter. For example, one of the parameters is mold. Out of a box of 5 fruits, 1 had mold, so "1" is typed in for that field.

    Once the inspection report is filled out with however many boxes were inspected (the total number of boxes varies per product), it is either saved locally and then at a later time transferred into the SQL table.

    Once the data is in SQL, a web page will pull that data and show it as a "report." And this is where the data ends. Absolutely nothing else is done with the data.

    GilaMonster (12/19/2009)


    Will the data be stored for long periods after it's entered or is it enter, print report, delete?

    Looking at it from the data perspective,

    1. It gets saved into a temporary storage.

    2. The data is transferred into a SQL table.

    3. A report of the data can be viewed/printed

    4. Never, ever deleted. Stored in this table regularly for about a month after it enters SQL, however, in some cases (claims, returns, complains, etc.) the data will remain in SQL for many months, even years. Once this regular period passes, the data will be moved to an "archive" table, which will be basically storage but not frequently accessed.

    Dave Ballantyne (12/19/2009)


    Why are you so intent on storing all the data for all the reports in a single table ?

    Why not a table per report ?

    I didn't think of a separate table per report because I wanted to do the EAV table design initially, until I was "suggested" the 3000 fields design. This "suggestion" was done before I realized that with only two reports I am up to approximately 60 fields.

  • First of all, if the table will really have 3,000 columns, then the choice of having such a wide table simply vanishes because SQL Server can only have something like 1024 columns per table and something like only 600 columns (IIRC and they have improved on that in recent editions) can be replicated it replication is ever needed.

    The question to ask is, if you could make a 3000 column table, how many of them would be null for any given fruit.

    It's not going to be easy but if you wanted proper normalized data (which is the way to go for everything except for a final reporting table), then you'd need to classify the 3000 columns and group related columns (usually when 1 is filled out, others are also filled out) into well formed tables along with the necessary FK's. Doing such a thing certainly isn't easy upfront and that's why a lot of people make the mistake of not doing it up front and settling on an EAV or a very wide table.

    Still, there are many things in life where it appears that single rows in a single table are the way to go... call detail records are one of those things, for example. But those types of things always have data for all columns... there are no nulls in any of the rows.

    I guess that's what will ultimately make the decision for you... for any given item, how many columns will contain nulls and what are the pattern's of the nulls? Obviously, that's not a question I need an answer to... it's a question to guide you in your decision making process.

    As a side bar, you can overcome the number of columns per table limit by having parallel sister tables with common ID's. Obviously, one of the tables would be the "master" that generates the shared row ID's. Of course, EAV's are "easy" and through the use of cross-tabs or pivots, can generate many different reports quite easily but it's normally at a pretty big price when it comes to performance. That's why people don't make databases by putting everything all into one huge EAV.

    The bottom line is "It depends"... my inclination would be to do it right and find someway to normalize the data... but if the data will always have non-null entries for all 3000 columns, a set of sister tables may be the way to go.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Since the data will be stored for posterity, I'm going to suggest a full, normalised database design, tables that store information about single 'things', foreign key relationships, the works. Yes, it's more work up front, but it'll save time when reports get changed, created, not needed any longer, etc.

    Don't think about the reports when designing the tables, think about the data that you're storing. Data can be manipulated for the reports by stored proc or view. That's the easy part. Designing the tables is the hard part.

    If I was doing a design for your system, I'd start (but by no means guarantee to finish with) the following 'entities'

    Container (maybe)

    Pallet (maybe)

    Box

    Sample

    InspectionTypes

    InspectionParameters

    InspectionResults (possibly more than one table)

    Plus any other entities that you need to store data on. Then go through the levels of normalisation, up to Boyce-Codd and split tables are necessary. Yes, it's going to be a lot more tables than you had, but it's going to allow all sorts of integrity checks that you don't have now. (what would a value for 'mold' of 11am really mean?) and it'll make changes to reports far easier than with either the all-in-one-row or all-in-one=column' designs.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Viewing 13 posts - 1 through 12 (of 12 total)

    You must be logged in to reply to this topic. Login to reply