Is a 30 GB DB large enough to warrant the use of a Data Warehouse to address performance?

  • First let me apologize if I have posted this to the wrong forum but I was unable to decide which of the various one that this could be associated with would be the right one to post to so I decided to go with General.

    My company produces and accounting program that uses SQL Server as the RDMS. I am concerned about what management has decided to do regarding issues with reporting performance. We are starting to deal more and more with large scale clients, users with DB's 4 & 5 times larger then what just a few years back we considered to be a large database. The route that we are pursuing to improve performance is the use of Data Marts. What concerns me is that I don't believe we are dealing with DB's yet that are of a size that one should need to look at using Data Warehousing to provide acceptable performance. The clients who are having the worst of the performance problems have DB's that range from 15 to 30 gigabytes. The largest size DB I have heard of from any client is the 30 GB give or take a GB. Now that just doesn't sound to me like a large enough DB that we should have to use a Data Warehouse to solve performance issues with reporting. I do not pretend to be a Data Warehouse expert but I am fairly certain that the Date Warehouse route is intended for DB's that are measured in hundreds of gigabytes not something that is 30GB at the largest side of the spectrum. The majority of our users have DB's anywhere from 2 to 10 GB's. I am concerned that we are pursuing a route that we should not need to just yet. The DB is large not because of just a few tables. The DB typically contains a couple hundred tables with the largest table, the one that stores transactional info, containing around 5 million records in the largest of the databases.

    Am I just completely off track here? Does anyone else have any experience in this area and can anyone provide any kind of guide lines as to what size a DB should be before looking at the use of a Data Warehouse?

    Thanks

    Ed

    Kindest Regards,

    Just say No to Facebook!
  • It's all subjective and has *nothing* to do with the size of the database.  A datawarehouse is appropriate for a 1 meg DB if the data changes over time and the data mods are not recorded by the primary application while (another part of) the business wants to see the changes.

    The decision is based primarily on performance and the ability to see the data in the manner one desires.  (The latter goes back to performance, as one can write a stored proc to transform data into pretty much whatever form is desired.)

    Primary performance issues include: basic query perf.,  contention with primary business applications (many reads may interfere with needed writes or vice versa)...

    For executives who want quick little reports showing week/month at a time snapshots with geewiz graphs... datamarts are the way to go to satisfy their need to spend money on toys...

  • Journeyman,

     

    Kindest Regards,

    Just say No to Facebook!
  • Think of where you might be a few years from now. While your largest client now has a database of 30GB is it hard to imagine a customer coming along with a 300Gb database?

    If you have designed your database with the expectation that it will have to hold 300GB, then you may be able to support such a customer. But if you are always focused on "now" then you will be playing catchup if a big customer arrives.

    If speed of reporting is an issue, start looking into Analysis Services as well as Reporting Services. Analysis Services placed over a well structured database is able to satisfy most reporting queries in a fraction of the time a RDMS query would take. Analysis Services are designed to produce reports on aggregate information... so it pre builds answers to queries. Throw Reporting Services over the top of Analysis Services and you then gain caching and multiple reporting outputs (like Excel / PDF), as well as the ability to schedule and email reports to people. Really useful stuff.

     


    Julian Kuiters
    juliankuiters.id.au

  • Julian,

       Curiosity question. 

       It is my opinion, based on reading from numerous sources, that it is very important that the database be properly structured for optimal performance and that queries be designed with the same thought in mind.  For example a commonly referenced performance item is to strive for covering queries.  Why not always just use Anaylasis Services with MDX queries instead of the standard relational DB T-SQL queries for the purposes of reporting?

      Unfortunately Reporting Services is not an option no matter how much I want it to be.  I have been pushing for our development group to look into it for over a year now but I believe it has fell on deaf ears. 

      As for the design of the DB, the original design was done 10 years ago wuth a DOS based program as the front end.  This eventually evolved to an enterprise level product using SQL Server as the back end.  The general design has not changed and so we are facing issues related to design that are effecting performance.  For example the DB has too many tables that store too many items that should be in seperate tables if kept to third normal form. 

      In all honesty I don;t know what we would do if a client grew a DB to 100GB.  If you would have asked me about this just a few months back I would have said we would probably look at usinig Data Warehousing to address the reporting performance side.  However if we are looking at using that for a 30 GB DB where will we be when the DB's grow to hundreads of GBs and that will eventually happen.  This is basically why I have posted this question.  I work in the department that assists clients with custom reporting.  I have no control at all on the development side.  I do have a lot of experience with SQL especially on SQL Server.  I am trying to find info on Data Warehousing, performance and how al that ties in with DB size if it even does so as to have solid info to take to management in hopes of implementing a few changes. 

      Currently I have a report I built that uses a standard T-SQL Query to return some financial information.  The same report put out by our development group that returns the same type of data takes on average twice as long as my report to run.  To make this difference more significant the report from the development team uses a Stored Procedure and my report does not.  My thought is that if my report that does not use a Stored Procedure executes in half the time that the one that uses a Stored Procedure does then the Stored Procedure version is most likely not written in as effecient a manner as possible.

      I'd like to thank each of your for replying to my post and providing me with some very useful information.  I believe I am going to visit the other sections that are specifc to T-SQL to get some feedback on effecient queries.

     

    Thanks Again

    Ed

    Kindest Regards,

    Just say No to Facebook!
  • The term "data warehouse" can mean a lot of different things to different people. Some think of OLAP and cubes, etc., but a simple definition might be: a database reorganized to support reporting.

    In your case, where you have a complex legacy database that cannot be radically changed, such a "data warehouse" might be just the ticket.

    Simply offloading the report processing to another server and/or eliminating contention with the transactional applications can provide quite a boost, but the biggest gains are likely to come from reorganization. In other words, the design of the data warehouse is, as usual, the key to good performance.

    For some anecdotal tips on what to do (and avoid), see my article at: http://www.sqlsavior.com/8HoursTo2.html

    Good luck!

  • Yes, database design is key to your success, regardless of what level of design you go with.

    If your database has evolved from a 10 year old DOS database, It's probably time to start from scratch. The design rules have significantly changed since those days, because we have bigger faster computers. Wheras it may have been optimal in those days to have hundreds of small tables due to RAM limitations, it is probably more optimal for you to have a few large tables.

    Winning Managements approval for something can always be fun. I find the best way to do it is with factual proof. Take a common, slow performing report, that's not too complex. Create a copy of the database, restructure the objects needed for the report. Maybe throw analysis server over the top. Create and time the report. Requery the report a couple of times (benchmark) and show Management the results.

    If you can show "Hey, I changed all this and now this report runs in 1 / 30 th of the time it used to" that's a pretty compelling argument for Management to look closer.


    Julian Kuiters
    juliankuiters.id.au

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply