DB Design Question

  • I'm pretty new to DB design but am trying to write a project to help me learn.  I have some general DB design questions and I'm looking for someone w/ some experience in these types of matters to help me out if possible.

     

    My database is pretty well normalized (or at least I'm pretty sure it is).  I've been writing some queries to do the functionality I want and they join a decent amount of tables (8-10).  I'm concerned that eventually performance will drag.  So, some general questions are...

    1) Is 3rd normal form a good goal to shoot for when designing a DB?  Or, is that probably "too normalized" in practice because it's probably "asking for trouble" later on when trying to optimize?

    2) Is there a rule of thumb about having "too many tables in a query?"  Does it indicate a problem if a query that is supposed to do some basic functionality of an application joins like 8-10 tables?  Or is this normal?

    3) When should the decision be made to de-normalize the DB?  Should it be before you write the DB or, in small to medium projects, is it okay (and probably expected!) to do a re-design mid-way?  Obviously the earlier you can get the design right the better but I'm asking more about people's experience in practice

    4) What are people's thoughts about triggers/constraints for a DB?  Many developers (and that's my background) seem to want to do all the constraining in their application.  Personally, I don't like that approach.  Well, to clarify, I don't think it's enough.  I think you should put in application constraints but the best place to put them in is the DB.  I put in some constraints on my DB but it's hard (and complicated) to put in (as well as think of) constraints for every possible situation.  I guess some constraints is better than none though, right?

    5) When in the development lifecycle should constraints be put into place?  In my project, I spec'd out my application first.  Then I did my DB design.  Then I wrote my constraints.  Then I wrote my code.  Does that sound like the correct approach or is it "inadvisable" to write the constraints/triggers too early because a project is likely to change a good amount as it is developed.

    I realize these are general questions but it would be helpful to me to hear what experienced people have to say on these issues.  Thanks in advance for you comments.

     

  • I will have a go at some of the questions...

    a) You should always go for a 3NF design at the logical design stage.  Until you have a 3NF design, you do not really know if you understand all the data, or even if you have all the necessary data.  If you do not get to a 3NF design at the logical stage, you are very likely to have a very poor application at the physical stage.

    b) You should de-normalise only when necessary.  Never de-normalise because some people think it is cool.  Only de-normalise when you start to put the physical model together.  Do not even think about de-normalising at the logical stage.

    c) Common reasons to de-normalise are

    i) Large number of tables in frequently-used queries.  Your main queries should join very few tables (2 - 4 maximum).  Your once-a-month queries can join everything in the database if that is needed to get the answer.

    ii) Very wide rows with infrequently used columns.  It can be beneficial to split a wide row into two tables, with frequently used columns in 1 table and the rest in another. 

    iii) DBMS restrictions.  If you have a number of LOBs in a row, it can be best/necessary to place the LOBs into separate tables.

    iiii) Reduce number of dimension tables in a DW.  Sometimes merging multiple dimensions into a single super-dimension can help performance.

    d) Common reasons to NOT denormalise

    i) No performance gain will be achieved.  Unless performance is improved, there is no point in de-normalising.

    ii) Database size will remain small (under 10 GB).  In a small database, it is unlikely that denormalising a 3NF design will give any noticible performance gain.  (See i !)

    e) Constraints.  Your approach to constraints seems quite sensible.

    Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

    When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

  • NORMALISATION

    Normalise when it is practical to do so.  If you have a sales by day of week arrangement then there would be little point in normalising it down to a DAY/SALE table.

    I tend to denormalise when implementing a data warehousing solution, particularly if a normalised query would require a large number of tables.

    NUMBER OF TABLES

    I try to minimize the number of tables/views in a single query.

    The query optimiser has to calculate the best way of joining the tables and the number of calculations goes up geometrically as the number of tables increase.

    CONSTRAINTS IN DATABASE

    Let the database look after the data.  If you put a unique constraint on a table then a programming error in the front end app cannot add duplicate values into your database.

    If the database is a heavy use database then I limit the constraints to the bear minimum because there is a processing overhead to enforce them.  For light use applications I might beef up the constraints in the database to bugs that fall through the front end app.

    If multiple applications use the same database then again, I would put more constraints in the database because a bug in one application could corrupt the data for another.

    WHEN TO DESIGN CONSTRAINTS

    Day one.

    In the apps that I develop the data is the foundation and corner stone of the application.  The storage and protection of the data is fundamental to the application.  I am of the opinion that planning and design should take the lion share of the project.  The more projects that I become involved with (20 years experience) the more convinced I am that shortcuts taken in planning and design will return to bite you in the bum....HARD.

    Get the foundations right and it makes the rest of the job easier.

  • I echo all the comments already made about normalising etc.

    One thing I would say is that selecting from single tables at a time can most times be far faster than joining the tables because of the way that the optimizer works.  I learned this lesson when working as an Applications Programmer a few years ago.  I select rows from tables in a 'most effect' fashion - so that the table that returned the fewest row was hit first, this then passed the restriction to the next most restrictive select and so on.  Because the selects were so restrictive, the performance was excellent.  If you must use joins on tables, make sure that your WHERE clauses are carried out in most restrictive to least restrictive order.  I can explain more about this if any wants me to.

    Another trick I learned recently was that often people will do joins and have multi-level subqueries on tables when a UNION will work better.  One of the programmers where I work now gave me a piece of badly performing SQL that joined about 6 tables, each two levels of subquery.  It took upto 40 minutes to run.  When I rewrote the query using 'UNION ALL' it took a consistent 3 seconds even across a Europe wide network.

    The last thing to say about design is that you need to fully understand the system model and business rules to stand a chance as the system model will determine what data you need to store and the business rules will determine the constraints under which the model will operate.

  • I echo all the comments already made about normalising etc.

    One thing I would say is that selecting from single tables at a time can most times be far faster than joining the tables because of the way that the optimizer works.  I learned this lesson when working as an Applications Programmer a few years ago.  I select rows from tables in a 'most effect' fashion - so that the table that returned the fewest row was hit first, this then passed the restriction to the next most restrictive select and so on.  Because the selects were so restrictive, the performance was excellent.  If you must use joins on tables, make sure that your WHERE clauses are carried out in most restrictive to least restrictive order.  I can explain more about this if any wants me to.

    Another trick I learned recently was that often people will do joins and have multi-level subqueries on tables when a UNION will work better.  One of the programmers where I work now gave me a piece of badly performing SQL that joined about 6 tables, each two levels of subquery.  It took upto 40 minutes to run.  When I rewrote the query using 'UNION ALL' it took a consistent 3 seconds even across a Europe wide network.

    The last thing to say about design is that you need to fully understand the system model and business rules to stand a chance as the system model will determine what data you need to store and the business rules will determine the constraints under which the model will operate.

  • I echo all the comments already made about normalising etc.

    One thing I would say is that selecting from single tables at a time can most times be far faster than joining the tables because of the way that the optimizer works.  I learned this lesson when working as an Applications Programmer a few years ago.  I select rows from tables in a 'most effect' fashion - so that the table that returned the fewest row was hit first, this then passed the restriction to the next most restrictive select and so on.  Because the selects were so restrictive, the performance was excellent.  If you must use joins on tables, make sure that your WHERE clauses are carried out in most restrictive to least restrictive order.  I can explain more about this if any wants me to.

    Another trick I learned recently was that often people will do joins and have multi-level subqueries on tables when a UNION will work better.  One of the programmers where I work now gave me a piece of badly performing SQL that joined about 6 tables, each two levels of subquery.  It took upto 40 minutes to run.  When I rewrote the query using 'UNION ALL' it took a consistent 3 seconds even across a Europe wide network.

    The last thing to say about design is that you need to fully understand the system model and business rules to stand a chance as the system model will determine what data you need to store and the business rules will determine the constraints under which the model will operate.

  • I echo all the comments already made about normalising etc.

    One thing I would say is that selecting from single tables at a time can most times be far faster than joining the tables because of the way that the optimizer works.  I learned this lesson when working as an Applications Programmer a few years ago.  I select rows from tables in a 'most effect' fashion - so that the table that returned the fewest row was hit first, this then passed the restriction to the next most restrictive select and so on.  Because the selects were so restrictive, the performance was excellent.  If you must use joins on tables, make sure that your WHERE clauses are carried out in most restrictive to least restrictive order.  I can explain more about this if any wants me to.

    Another trick I learned recently was that often people will do joins and have multi-level subqueries on tables when a UNION will work better.  One of the programmers where I work now gave me a piece of badly performing SQL that joined about 6 tables, each two levels of subquery.  It took upto 40 minutes to run.  When I rewrote the query using 'UNION ALL' it took a consistent 3 seconds even across a Europe wide network.

    The last thing to say about design is that you need to fully understand the system model and business rules to stand a chance as the system model will determine what data you need to store and the business rules will determine the constraints under which the model will operate.

  • I echo all the comments already made about normalising etc.

    One thing I would say is that selecting from single tables at a time can most times be far faster than joining the tables because of the way that the optimizer works.  I learned this lesson when working as an Applications Programmer a few years ago.  I select rows from tables in a 'most effect' fashion - so that the table that returned the fewest row was hit first, this then passed the restriction to the next most restrictive select and so on.  Because the selects were so restrictive, the performance was excellent.  If you must use joins on tables, make sure that your WHERE clauses are carried out in most restrictive to least restrictive order.  I can explain more about this if any wants me to.

    Another trick I learned recently was that often people will do joins and have multi-level subqueries on tables when a UNION will work better.  One of the programmers where I work now gave me a piece of badly performing SQL that joined about 6 tables, each two levels of subquery.  It took upto 40 minutes to run.  When I rewrote the query using 'UNION ALL' it took a consistent 3 seconds even across a Europe wide network.

    The last thing to say about design is that you need to fully understand the system model and business rules to stand a chance as the system model will determine what data you need to store and the business rules will determine the constraints under which the model will operate.

  • 1) Is 3rd normal form a good goal to shoot for when designing a DB?  Or, is that probably "too normalized" in practice because it's probably "asking for trouble" later on when trying to optimize?

    I would personnaly say the goal is all normal forms that can be reached. I have studied and studied these to make sure I fully understand them (even thou they are a bit heady in the way they are

    written), but the Normal forms are about addressing flaws than compactness of data (just tends to lead to that). The key is if you can you insert the data without a required other condition, can

    you delete a row without remove a key piece of info that would then be lost, and can you update the record without a syncing issue. Normalize as full as possible then denormalize as needed for

    performance issues, but look for alternatives first if at all possible.

     

    2) Is there a rule of thumb about having "too many tables in a query?"  Does it indicate a problem if a query that is supposed to do some basic functionality of an application joins like 8-10

    tables?  Or is this normal?

    No rule other than performance concerns. I have joined as many as 17 and there was no noticeable or concernable delay in the query. Sometimes thou join slow downs can be address with temporary

    tables to do some of the gathering instead of denormalizing.

    3) When should the decision be made to de-normalize the DB?  Should it be before you write the DB or, in small to medium projects, is it okay (and probably expected!) to do a re-design mid-way? 

    Obviously the earlier you can get the design right the better but I'm asking more about people's experience in practice

    See answer to 1 for parital. As for redesign it is very common to make changes thruout the design to improve upon. Just make sure you set a list of standards and keep good documentation. Might also

    consider look at some of the books on Refactoring and X-treme Programming. Also look here,

    http://www.extremeprogramming.org/

    http://www.martinfowler.com/

    http://www.refactoring.com/

    http://www.instantiations.com/jfactor/files/refback.pdf

    But keep in mind altought helpfull I am not too sure that the extreme points reached are totally needed such as refactoring an existing stable application for the sake of code reuse. My opinion is

    that if it isn't broke, don't touch it unless you really have nothing better to do.

    4) What are people's thoughts about triggers/constraints for a DB?  Many developers (and that's my background) seem to want to do all the constraining in their application.  Personally, I don't

    like that approach.  Well, to clarify, I don't think it's enough.  I think you should put in application constraints but the best place to put them in is the DB.  I put in some constraints on my DB

    but it's hard (and complicated) to put in (as well as think of) constraints for every possible situation.  I guess some constraints is better than none though, right?

    Every thing you do in regards to database should be to reach the goal of complete data intergrity. Constraints, triggers, business logic in Stored Procedures are tools you can use to reach that

    goal as a data safety net. However putting too much logic in the DB can also be a bigger perofrmance issue than a gain. Weight your pros and cons and test when unsure. Generally thou good

    constriants and triggers are going to save you lots of potential headaches long term.

    5) When in the development lifecycle should constraints be put into place?  In my project, I spec'd out my application first.  Then I did my DB design.  Then I wrote my constraints.  Then I wrote

    my code.  Does that sound like the correct approach or is it "inadvisable" to write the constraints/triggers too early because a project is likely to change a good amount as it is developed.

    As soon as you recognize a paritcular object should be finitely controlled you should have a defined constraint for it. You can do immediately and always adjust these as needed. Odds are you

    actually would get better implementaion by following a circular design approach. This however can be a personal preference situation and may also have to consider the overall complexity of the final potential product, the more complex the more you may want to build both the front and back at the same time for slow methodical testing to ensure the flow and design work hand in hand. Or, you may decide the backend is fairly straightfoward and build it first. It is similar thou to the question of where to start eating a sandwhich made with loaf bread, some prefer a speicifc corner, some will start in the middle of a praticular side, and some will cut it in hald or a wedge and start in the heart of the sandwhich. So in essence there is no right or wrong answer there.

  • Amen to waht has been said, I second every thing , I have been doing alot of maintenance programming on someone else's job and very little maintenance on databases with good design, even when the Client code was bad... The design does not have to be perfect.  But if you follow David's advice, less headache in the long run.  Remember Database Design really is an art and really only comes with some experience, because you are not only dealing with the SQL Plumbing but with a Business Model as well and need to translate that to your design. bad Business model = bad Database Design.

    Get all the requirements as possible up front and  spend extra time in the database design, look at all the end reports that will be required and work backwards, look at the everything, Forms, Transactions, Business purpose, do your own analysis by reviewing directly with the end-users and keep going back to your design.

    At first, think Normalization, keep tables at least Normalized to the 3rd Form is fine.

    Then, do not be afraid to denormalize, either for Performance or for Transaction Journals.  When you do denormalize, be sure to have a good reason and document well and add functions to keep the integrity of the data.

    Constraints, Constraints, Constraints, it can't be said enough and don't let your Client programmers quietly break them!  Please use foreign key constraints too, NOW!  Use triggers only when Constraints will not work. Again, it takes some time invested to know how much and when.

    7 years ago I built a system from the ground up that was completely integrated with the Accounting and most aspects of the company, I spent the extra time up front with a solid Database Design and used just the suggestion here, looking back there was 3 years of very little maintenance, maybe twice a year. Then when the business changed and I went back to make modifications and add features.  I saw how badly I wrote the Client code and suprised how well it held up for so many years.  I attribute that to the good foundation of a good database design!

  • This is a question about constraints in keeping with the topic...

    I am also in the same situation. I am designing a commission system where we monthly import 2 to 3 hundred thousend records each month and they cant be archived (business rules). Each set set of imports have different fields but a core set that are common..AgentID, TranDate etc.

    I have made a "Common" table to hold the common fields and other specific tables for specific imports..so I kind of spit the imports up - half in Common and half in TransactionA, B, C etc based on the import.

    I have constraints on the tables so the same data cant be imported twice or no dupes get in. But the issue I have is if there is any dupe value in import then none of the data gets imported. How do I get around that. There must be something I am missing...

    Thanks for your thoughts and comments.

     

  • There is an IGNORE_DUP_KEY when creating and index which has to be on a unique index.

    If you run an insert statement that adds rows to a table and some of those rows contain duplicates  then SQL Server will give a warning and only insert the non-duplicate rows.

    This then leaves you the problem of dealing with the duplicates.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply