Time dimension key as HHMM (text) or HMM (int)

  • Hi All
    Is it worth the extra time to cast my time key from HHMM text to HMM int?
    I have read that indexes will be quicker this way?

    Thanks
    Mattie

  • An integer takes four bytes in SQL Server, while char(4) would also take four bytes.  Since you're never going to have a number larger than 2359, you could use a small integer for your PK and FKs.

    Is your source data coming across as a four character string?  If so, I'd definitely cast to a small integer.  If your DW already has the PK as text, then I don't think I'd bother trying to rework this.

    I hope this helps,
    Rob

  • Personally, I wouldn't use either.  Use a TIME datatype.  It'll save you on a huge number of conversion problems and implicit casts in the future.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff,

    Would you really use a Time datatype for a Time dimension primary key?  I guess it would really depend upon the granularity of your data, but slicing down to milliseconds seems like overkill.

    I see Kimball talking about using an integer as an offset of minutes or seconds (depends how granular you need to get) from midnight.

    http://www.kimballgroup.com/2004/02/design-tip-51-latest-thinking-on-time-dimension-tables/

    Just curious,
    Rob

  • robert.gerald.taylor - Tuesday, May 30, 2017 11:32 AM

    An integer takes four bytes in SQL Server, while char(4) would also take four bytes.  Since you're never going to have a number larger than 2359, you could use a small integer for your PK and FKs.

    Is your source data coming across as a four character string?  If so, I'd definitely cast to a small integer.  If your DW already has the PK as text, then I don't think I'd bother trying to rework this.

    I hope this helps,
    Rob

    I haven't built the Time dimension yet so I can certainly cast the source data as int, thanks.

  • Jeff Moden - Tuesday, May 30, 2017 12:17 PM

    Personally, I wouldn't use either.  Use a TIME datatype.  It'll save you on a huge number of conversion problems and implicit casts in the future.

    Hi
    I'm only going to be slicing the data by Hour of QuarterHour. I doubt I will even use the actual time!

  • robert.gerald.taylor - Tuesday, May 30, 2017 12:31 PM

    Jeff,

    Would you really use a Time datatype for a Time dimension primary key?  I guess it would really depend upon the granularity of your data, but slicing down to milliseconds seems like overkill.

    I see Kimball talking about using an integer as an offset of minutes or seconds (depends how granular you need to get) from midnight.

    http://www.kimballgroup.com/2004/02/design-tip-51-latest-thinking-on-time-dimension-tables/

    Just curious,
    Rob

    I can't (won't) speak for Jeff... but... YES, YES, YES! a 1,000 times YES!
    Using an INT data type "may" have made sense back before the TIME data type was introduced but not any more. 
    Depending on the selected resolution the TIME data type can take as little as 3 bytes (as opposed to 4 for INT) and have a 1 second resolution.
    Same goes for dates as well...
    Using the proper data types for storing date/time data not only takes up less space, but allows the use of date/time functions and can save needlessly joining to calendar tables or time dimension tables...

  • Jason A. Long - Tuesday, May 30, 2017 1:09 PM

    robert.gerald.taylor - Tuesday, May 30, 2017 12:31 PM

    Jeff,

    Would you really use a Time datatype for a Time dimension primary key?  I guess it would really depend upon the granularity of your data, but slicing down to milliseconds seems like overkill.

    I see Kimball talking about using an integer as an offset of minutes or seconds (depends how granular you need to get) from midnight.

    http://www.kimballgroup.com/2004/02/design-tip-51-latest-thinking-on-time-dimension-tables/

    Just curious,
    Rob

    I can't (won't) speak for Jeff... but... YES, YES, YES! a 1,000 times YES!
    Using an INT data type "may" have made sense back before the TIME data type was introduced but not any more. 
    Depending on the selected resolution the TIME data type can take as little as 3 bytes (as opposed to 4 for INT) and have a 1 second resolution.
    Same goes for dates as well...
    Using the proper data types for storing date/time data not only takes up less space, but allows the use of date/time functions and can save needlessly joining to calendar tables or time dimension tables...

    Wow! This is so confusing. I still need date and time dimensions for the slicing by Week / Quester Hour etc. but I should use a date and time column for the PKs?

  • MattieG72 - Tuesday, May 30, 2017 1:29 PM

    Wow! This is so confusing. I still need date and time dimensions for the slicing by Week / Quester Hour etc. but I should use a date and time column for the PKs?

    Yes... The exact same way that you're currently using the INT data type in the Kimball method. 
    There is nothing that would prevent you from using a DATE or TIME data type as a primary key.

  • Something along these lines...

    USE tempdb;
    GO

    CREATE TABLE dbo.dim_Time (
        MilTime TIME(0) NOT NULL,
        DisplayTime VARCHAR(10) NOT NULL,
        CONSTRAINT pk_dimTime_MilTime PRIMARY KEY CLUSTERED (MilTime)
        );

    WITH
        cte_n1 (n) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (n)),
        cte_n2 (n) AS (SELECT 1 FROM cte_n1 a CROSS JOIN cte_n1 b),
        cte_n3 (n) AS (SELECT 1 FROM cte_n2 a CROSS JOIN cte_n2 b),
        cte_Time (time) AS (
            SELECT TOP 86400
                CAST(DATEADD(ss, ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) -1, '00:00:00') AS TIME(0))
            FROM
                cte_n3 a CROSS JOIN cte_n3 b
            )
    INSERT dbo.dim_Time (MilTime, DisplayTime)
    SELECT
        t.time,
        CONVERT(VARCHAR(10), t.time, 109)
    FROM
        cte_Time t;

    SELECT
        t.MilTime,
        t.DisplayTime
    FROM
        dbo.dim_Time t;

  • Jason,
    Thank you for your thoughts -- all of the data warehouses I've worked on have only used date dimensions, so my questions are more theoretical at this point.  I was merely thinking that your time dimension will be quite large (many records, as opposed to size of data type) if you are going down to the millisecond for 24 hours.  If you go to the second, you have 86,400 records (24 X 60 X 60).

    Rob

  • robert.gerald.taylor - Tuesday, May 30, 2017 2:30 PM

    Jason,
    Thank you for your thoughts -- all of the data warehouses I've worked on have only used date dimensions, so my questions are more theoretical at this point.  I was merely thinking that your time dimension will be quite large (many records, as opposed to size of data type) if you are going down to the millisecond for 24 hours.  If you go to the second, you have 86,400 records (24 X 60 X 60).

    Rob

    The main point is that you should should use the proper DATE/TIME data types when storing date/time data and to get away from the idea if "date & time keys" that are stored as INTs or VARCHARs.
    86,400 rows is pretty tiny when compared to other tables, but it's up to you to determine whether or not you need such a table in your schema.

  • Jason A. Long - Tuesday, May 30, 2017 1:52 PM

    MattieG72 - Tuesday, May 30, 2017 1:29 PM

    Wow! This is so confusing. I still need date and time dimensions for the slicing by Week / Quester Hour etc. but I should use a date and time column for the PKs?

    Yes... The exact same way that you're currently using the INT data type in the Kimball method. 
    There is nothing that would prevent you from using a DATE or TIME data type as a primary key.

    Ok, I think I will sleep on that but good to know it's not incorrect. Thanks all.

  • I have a data warehouse with some accumulating snapshot type fact tables, all of which have dates and several of which have times at different levels of granularity, with minute being the smallest.

    First, use a smallint as the PK.  Even with the introduction of the time data type (which is nice).  As mine is down to minutes, key 0 is midnight, with the numbers incrementing by 1 for each minute.  Each row is representing by the number indicating how many minutes past midnight that it is.  That helps with determining time elapsed because I just subtract the keys to get the number of minutes.  The time dimension also has columns for the quarter hours and hours.  So all the midnight to 00:14 entries have a 1 for 12:00-12:14:59, and so on.  I create views on these columns, and these hook to the fact tables where the granularities are for quarter hours and hours.

    Second, keep the dates and times in separate dimensions.  Both time dimensions have a date counterpart.  Combining them would be far too unwieldy.  We had some consultants do that with some work that had to be done while I was away doing military service.  I had to rip that portion of the work up.  Having them separate makes for a very easy pivot like table with days as the columns and time as the rows.

    Been using this system for many years, mostly in SQL 2005.  Am in the middle of upgrading all to SQL 2016.  Love the time type, but will not give up the Kimball system of surrogate keys.  I also use these for my date dimension.  It makes date range calculations easier, too.  Besides, some of the dimensions are 2SCD, and the independent keys are indispensable.

    --Edited for clarity

  • Jason A. Long - Tuesday, May 30, 2017 2:57 PM

    robert.gerald.taylor - Tuesday, May 30, 2017 2:30 PM

    Jason,
    Thank you for your thoughts -- all of the data warehouses I've worked on have only used date dimensions, so my questions are more theoretical at this point.  I was merely thinking that your time dimension will be quite large (many records, as opposed to size of data type) if you are going down to the millisecond for 24 hours.  If you go to the second, you have 86,400 records (24 X 60 X 60).

    Rob

    The main point is that you should should use the proper DATE/TIME data types when storing date/time data and to get away from the idea if "date & time keys" that are stored as INTs or VARCHARs.
    86,400 rows is pretty tiny when compared to other tables, but it's up to you to determine whether or not you need such a table in your schema.

    Jason, I agree with you about using date/time in an OLTP system.  In a star schema DW, I'm of the same opinion as Ron.

    I was just curious as to the arguments for using a Time data type as the primary key for a Time dimension in a data warehouse.  So far, I've seen nothing that sways my opinion in that direction.

    Thanks,
    Rob

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply