How to find overlapping records in evolved dimension table

  • How do I find overlapping records in an evolved dimension table with a begin and end date for each record.

    For example, assume my dtbl_employee_evolved table has the following:

    emp_id, evolv_key, lastname, begin_date, end_date

    1 1000 Roberts 01/01/2016 01/15/2016

    1 1001 Smith 01/16/2016 01/31/2016

    2 1002 Edwards 07/01/2016 09/04/2016

    2 1003 James 07/06/2016 08/11/2016

    2 1004 Flanagan 08/12/2016 08/31/2016

    Each change in lastname should record the day the old name changed and the day the new name began.

    There should not be an overlap as an employee cannot have different last names on the same day.

    Employee 2 has overlapping records, if you try to find his last name on July 7 2016, you will get two records.

    2 1002 Edwards 07/01/2016 09/04/2016

    2 1003 James 07/06/2016 08/11/2016

    I need a query that will return all overlapping records. I've tried recursive table expression and cross join unsuccessfully.

  • Something like the following should give you a start. You didn't say what you wanted to do once you found the overlaps.

    SELECT *

    FROM dtbl_employee_evolved AS a

    INNER JOIN dtbl_employee_evolved AS b

    ON a.emp_id = b.emp_id

    AND a.begin_date < b.end_date

    AND a.end_date > b.begin_date

    AND a.evolv_key < b.evolv_key

    Also, Hungarian Notation is frowned upon in modern programming, but it's probably way too late to drop the prefix "dtbl_" from your tables.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Are you storing your begin_dates and end_dates as char or varchar columns?

    I ask because at some point your example data apparently switches from DD/MM/YYYY format to MM/DD/YYYY.

    Unless your data is consistent you have a problem.

    IF your begin_date and end_date columns were date or datetime datatypes, you could do something like this:

    select t1.*

    from sometable t1

    join sometable t2 on t1.emp_id = t2.emp_id

    and t1.evolv_key <> t2.evolv_key -- assumes this column is unique

    and ( t1.end_date between t2.begin_date and t2.end_date

    OR t1.begin_date between t2.begin_date and t2.end_date )

    If it is stored as character data, you could use the SUBSTRING() function to rearrange the data into YYYYMMDD format and use the resulting strings for the BETWEEN compares.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • If you already have entries for each day that should work, but if you don't the overhead is significant, and if that query is used often, if will affect performance.

    If the data is in this format, you can detect overlaps using simple date arithmetic, because an overlap is simply one interval starting after a given interval, but before the given is finished, something like

    select dr1.* from date_ranges dr1

    inner join date_ranges dr2

    on dr2.start > dr1.start -- start after dr1 is started

    and dr2.start < dr1.end -- start before dr1 is finished

    If you need special handling for interval that are wholly within another interval, or you need to merge intervals, i.e.

    PKey Start End Type

    ==== ===== === ====

    01 01/01/2010 20/01/2010 S

    02 15/01/2010 31/01/2010 S

    yielding

    Start End Type

    ===== === ====

    01/01/2010 31/01/2010 S

    you will need more complex calculation.

    In my experience with this kind of problems, once you get how to do the calculation by hand, it's easy to transfer it into SQL 🙂

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply