Daily data but how to store and access?

  • Hello All,

    We have several tables that grow by large amounts of data on a daily basis. The first table grows by approx 2.5million rows per day, the second 0.8million and the third at about 1million.

    We (to date) have been storing these records in a single table (for each set), however this is appearing more and more cumbersome on the system and storage requirements overall.

    There is a clustered primary key on the only columns used to search the data.

    What we're currently *discussing* is whether or not to separate these records into a separate database (for each section) then separating the data into a new daily table each day.

    That part is easy... here's the tricky part.

    Ideally all data needs to be online should it be required. Calls may span a range of dates generally not exceeding 100 days at a time. What I'm trying to establish is should we create these tables in daily lots and if so, how could they be queried quickly and easily?

    Also, keeping in mind that these additions per day can multiply by 100x or 1000x with the futher addition in processing and personal resources.

    Any thoughs would be very welcome!

    Neil.

  • You could always have 2 sets of data.  On one server you would have the active data (let's say the last 4-6 months depending on your requirements).  Then you'd have an archive DB where the rest of the data would be accessible all the time (but I presume much less frequently).  That way you'd have a smaller amount of data to work on at the time.

     

    Also maybe some of us can spot some design improvements if you could post the tables DDL with their indexes.

  • Hello, thank you for your reply.

    Unfortunately, all data needs to be accessed all the time in the quickest time possible. When processing new data, a complete read of previous related totals (back to day 1) is required.

    This was the reason why I was leaning towards distributed partitioned views...

    The DDL is posted below. The DDL that we're looking at for views is the same only with the addition of a constraint clause on the 'Process Number'

    Thanks

    -Neil.

    CREATE TABLE [dbo].[ProcessHistory_00101] (

    [ProcessID] [int] NOT NULL ,

    [VarianceID] [varchar] (3) COLLATE Latin1_General_CI_AS NOT NULL ,

    [ProcessNumber] [int] NOT NULL ,

    [ProcessIDSub] [varchar] (3) COLLATE Latin1_General_CI_AS NOT NULL ,

    [VarianceIDSub] [varchar] (3) COLLATE Latin1_General_CI_AS NOT NULL ,

    [FPSIn] [int] NOT NULL ,

    [FPSOut] [int] NOT NULL ,

    [ValueCurrent] [money] NOT NULL ,

    [ValueIn] [money] NOT NULL ,

    [ValueOut] [money] NOT NULL ,

    [TotalInput1] [money] NOT NULL ,

    [TotalInput2] [money] NOT NULL ,

    [TotalInput3] [numeric](19, 7) NOT NULL ,

    [TotalInput4] [numeric](19, 7) NOT NULL ,

    [TotalInput5] [numeric](19, 7) NOT NULL ,

    [TotalInput6] [numeric](19, 7) NOT NULL ,

    [TotalInput7] [numeric](19, 7) NOT NULL ,

    [TotalInput8] [numeric](19, 7) NOT NULL ,

    [TotalInput9] [numeric](19, 7) NOT NULL ,

    [TotalInput10] [numeric](19, 7) NOT NULL ,

    [TotalInput11] [numeric](19, 7) NOT NULL ,

    [TotalInput12] [numeric](19, 7) NOT NULL ,

    [TotalInput13] [numeric](19, 7) NOT NULL ,

    [TotalInput14] [numeric](19, 7) NOT NULL ,

    [TotalInput15] [numeric](19, 7) NOT NULL ,

    [TotalInput16] [numeric](19, 7) NOT NULL ,

    [TotalInput17] [numeric](19, 7) NOT NULL ,

    [TotalInput18] [numeric](19, 7) NOT NULL ,

    [TotalResult1] AS ([TotalInput1] + [TotalInput2] + [TotalInput3] + [TotalInput5] + [TotalInput6] + [TotalInput8] + [TotalInput9] + [TotalInput11] + [TotalInput12] + [TotalInput13] + [TotalInput14] + [TotalInput15] + [TotalInput16] + [TotalInput17] + [TotalInput18] + [ValueIn] + [ValueOut]) ,

    [TotalResult2] AS (((-[TotalInput4])) + [TotalInput7] + [TotalInput10]) ,

    [ProcessCount] [smallint] NOT NULL ,

    [ProcessAudited] [tinyint] NOT NULL ,

    [ProcessAuditedExtra] [tinyint] NOT NULL

    ) ON [ProcessHistory_B002]

    GO

    ALTER TABLE [dbo].[ProcessHistory_00101] WITH NOCHECK ADD

    CONSTRAINT [PK_ProcessHistory_00101] PRIMARY KEY CLUSTERED

    (

    [ProcessID],

    [ProcessNumber],

    [ProcessIDSub]

    ) ON [ProcessHistory_B002]

    GO

    CREATE INDEX [IX_ProcessHistory_00101_01] ON [dbo].[ProcessHistory_00101]([VarianceID]) WITH FILLFACTOR = 80 ON [ProcessHistory_B002_IX]

    GO

    CREATE INDEX [IX_ProcessHistory_00101_02] ON [dbo].[ProcessHistory_00101]([VarianceIDSub]) WITH FILLFACTOR = 80 ON [ProcessHistory_B002_IX]

    GO


  • Unfortunately, all data needs to be accessed all the time in the quickest time possible. When processing new data, a complete read of previous related totals (back to day 1) is required.


    Then the answer is simple although you will need to seed a table...

    Seed a table (DailyTotals) with the totals for each day from your big table.  This will only be done once ever.  Using a good GROUP BY query, this can be done relatively quickly. 

    After that, the daily process becomes quick and easy...

    1.  Import your new daily data into a separate table (CurDay) making sure that the table is properly indexed to support your queries, of course.

    2.  Process the CurDay table using the greatly reduced and aggragated DailyTotals table.

    3.  Calculate the totals of the CurDay table and insert the summary record into the DailyTotals table.

    4.  Insert the records from CurDay into your big table.

    5.  Truncate the CurDay table.

    The side benfit is that you can easily get totals from any day directly and almost instantaneously.  Totals for periods (months for instance) can be aggragated in a similar high speed fashion because even 10 years of daily totals is only 3652 records.  A grand total shouldn't take but a couple of milliseconds to come by for daily processing.

    If the basis of the big table is by customer and the total must also be by customer, a similar method will work... aggragate the totals, by customer, by month in a table called MonthlyTotals and daily in the DailyTotals table.  At the end of the month, aggragate the DailyTotals table into the MonthlyTotals table.  You'd still save the CurDay records into the big table and truncate the CurDay table on a daily basis.

    The big table would then become nothing more than a record archive in case you ever lost the bubble and needed to regen the totals tables.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply