two dimension time merge (wrecked)

  • Dear Experts

    I am trying to merge two Dimensions (from a DWH) into a new one. I think I've figured it out basically, but there's a weird special case, which my code won't catch right.

    Below is an excerpt from the real data. Never mind the illogical city names, I just replaced some company terms.

    The "DeptCode" that's causing trouble is 7210 "London"

    The timeline I want to build is

    2016-07-06 00:00:00 - 2017-01-16 23:59:59 7600 -- notice substraction of 1 day for ValidTo

    2017-01-17 00:00:00 - 9999-12-31 00:00:00 7300

    Note sure, if that's even possible or the data's too wrecked.

    As mentioned above, I can handle it for the other patterns by using this code (perhaps there's a more elegant/efficient way to do it):

    Thanks a lot, Roger 🙂

     

    CREATE TABLE [dbo].[TestData](
    [struct_ValidFrom] [datetime2](7) NOT NULL,
    [struct_ValidTo] [datetime2](7) NOT NULL,
    [object_ValidFrom] [datetime2](7) NOT NULL,
    [object_ValidTo] [datetime2](7) NOT NULL,
    [DeptCode] [int] NOT NULL,
    [DeptName] [varchar](80) NULL,
    [DeptCodeParent] [int] NULL
    ) ON [PRIMARY]
    GO

    insert into dbo.TestData (struct_ValidFrom, struct_ValidTo, object_ValidFrom, object_ValidTo, DeptCode, DeptName, DeptCodeParent) values ('2017-01-17 00:00:00', '2019-01-14 23:59:59', '2016-07-06 00:00:00', '9999-12-31 00:00:00', 7312, 'Tokyo', 7310);
    insert into dbo.TestData (struct_ValidFrom, struct_ValidTo, object_ValidFrom, object_ValidTo, DeptCode, DeptName, DeptCodeParent) values ('2017-01-17 00:00:00', '2019-01-14 23:59:59', '2016-07-06 00:00:00', '9999-12-31 00:00:00', 1012, 'Paris', 1010);
    insert into dbo.TestData (struct_ValidFrom, struct_ValidTo, object_ValidFrom, object_ValidTo, DeptCode, DeptName, DeptCodeParent) values ('2019-01-15 00:00:00', '9999-12-31 00:00:00', '2016-07-06 00:00:00', '9999-12-31 00:00:00', 7210, 'London', 7600);
    insert into dbo.TestData (struct_ValidFrom, struct_ValidTo, object_ValidFrom, object_ValidTo, DeptCode, DeptName, DeptCodeParent) values ('2019-01-15 00:00:00', '9999-12-31 00:00:00', '2016-07-06 00:00:00', '9999-12-31 00:00:00', 1012, 'Paris', 1010);
    insert into dbo.TestData (struct_ValidFrom, struct_ValidTo, object_ValidFrom, object_ValidTo, DeptCode, DeptName, DeptCodeParent) values ('2019-01-15 00:00:00', '9999-12-31 00:00:00', '2016-07-06 00:00:00', '9999-12-31 00:00:00', 7312, 'Tokyo', 7310);
    insert into dbo.TestData (struct_ValidFrom, struct_ValidTo, object_ValidFrom, object_ValidTo, DeptCode, DeptName, DeptCodeParent) values ('2017-01-17 00:00:00', '9999-12-31 00:00:00', '2016-07-06 00:00:00', '9999-12-31 00:00:00', 1200, 'Berlin', 720);
    insert into dbo.TestData (struct_ValidFrom, struct_ValidTo, object_ValidFrom, object_ValidTo, DeptCode, DeptName, DeptCodeParent) values ('2017-01-17 00:00:00', '2019-01-14 23:59:59', '2016-07-06 00:00:00', '9999-12-31 00:00:00', 7210, 'London', 7300);
    insert into dbo.TestData (struct_ValidFrom, struct_ValidTo, object_ValidFrom, object_ValidTo, DeptCode, DeptName, DeptCodeParent) values ('2017-01-17 00:00:00', '2019-01-14 23:59:59', '2017-01-16 00:00:00', '2019-01-14 23:59:59', 1130, 'ZĂĽrich', 1100);
    insert into dbo.TestData (struct_ValidFrom, struct_ValidTo, object_ValidFrom, object_ValidTo, DeptCode, DeptName, DeptCodeParent) values ('2017-01-17 00:00:00', '2019-01-14 23:59:59', '2017-01-16 00:00:00', '2019-01-14 23:59:59', 1114, 'Madrid', 1130);
    insert into dbo.TestData (struct_ValidFrom, struct_ValidTo, object_ValidFrom, object_ValidTo, DeptCode, DeptName, DeptCodeParent) values ('2019-01-15 00:00:00', '2019-10-03 23:59:59', '2019-01-15 00:00:00', '9999-12-31 00:00:00', 1130, 'New York', 4600);
    insert into dbo.TestData (struct_ValidFrom, struct_ValidTo, object_ValidFrom, object_ValidTo, DeptCode, DeptName, DeptCodeParent) values ('2019-01-15 00:00:00', '2019-10-03 23:59:59', '2019-01-15 00:00:00', '9999-12-31 00:00:00', 1114, 'Beijing', 1130);
    insert into dbo.TestData (struct_ValidFrom, struct_ValidTo, object_ValidFrom, object_ValidTo, DeptCode, DeptName, DeptCodeParent) values ('2019-10-04 00:00:00', '9999-12-31 00:00:00', '2019-01-15 00:00:00', '9999-12-31 00:00:00', 1130, 'New York', 4600);
    insert into dbo.TestData (struct_ValidFrom, struct_ValidTo, object_ValidFrom, object_ValidTo, DeptCode, DeptName, DeptCodeParent) values ('2019-10-04 00:00:00', '9999-12-31 00:00:00', '2019-01-15 00:00:00', '9999-12-31 00:00:00', 1114, 'Beijing', 1130);

    with Departments as
    (
    -- merge periods
    select
    struct_ValidFrom as ValidFrom,
    struct_ValidTo as ValidTo,
    DeptCode,
    DeptName,
    DeptCodeParent
    from dbo.TestData
    union
    select
    object_ValidFrom as ValidFrom,
    object_ValidTo as ValidTo,
    DeptCode,
    DeptName,
    DeptCodeParent
    from dbo.TestData
    ),

    DepartmentsMerged as
    (
    -- merge/reduce timelines when a change is found
    select
    min(ValidFrom) as ValidFrom,
    max(ValidTo) as ValidTo,
    DeptCode,
    DeptName,
    DeptCodeParent
    from Departments
    group by
    DeptCode,
    DeptName,
    DeptCodeParent
    )

    select *
    from DepartmentsMerged
    -- problematic DeptCode:
    -- where DeptCode = 7210
    order by
    DeptCode,
    ValidFrom
  • Is it intentional to have two DeptCode Parent values for London? That seems to be why you're getting both values back. In your sample data, there isn't a date value that would fall into the question parameters you mentioned, so both come back as 9999-12-31.

    Alan H
    MCSE - Data Management and Analytics
    Senior SQL Server DBA

    Best way to ask a question: http://www.sqlservercentral.com/articles/Best+Practices/61537/

  • yep, it's the crap i am receiving  :-/

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply