TSQL Join question

  • I'm trying to figure out a good way to query what in my opinion a less than optimal schema. I created an example that mimics my schema. Basically I have widgets and widget statuses joined by widget id. A widget status record is created every time the widget status changes. What I need is a widget report that lists each widget along with it's most recent status. My solution works, but the way I join the most recent status record seems pretty clunky. Just wondering if there's a cleaner way to write the query without changing the schema of course.

    Thanks!

    declare @widgits table (

    WidgitId int,

    WidgitCode varchar(4),

    WidgitDescription varchar(30)

    )

    declare @WidgitStatus table (

    WidgitStatusId int,

    WidgitId int,

    StatusCode varchar(10),

    StatusDate datetime

    )

    insert into @widgits(WidgitId, WidgitCode, WidgitDescription) values(1, 'w1', 'Widgit 1')

    insert into @widgits(WidgitId, WidgitCode, WidgitDescription) values(2, 'w2', 'Widgit 2')

    insert into @widgits(WidgitId, WidgitCode, WidgitDescription) values(3, 'w3', 'Widgit 3')

    insert into @WidgitStatus(WidgitStatusId, WidgitId, StatusCode, StatusDate) values(1, 1, 'available', '20150101')

    insert into @WidgitStatus(WidgitStatusId, WidgitId, StatusCode, StatusDate) values(2, 1, 'low', '20150601')

    insert into @WidgitStatus(WidgitStatusId, WidgitId, StatusCode, StatusDate) values(3, 1, 'gone', '20151101')

    insert into @WidgitStatus(WidgitStatusId, WidgitId, StatusCode, StatusDate) values(4, 2, 'available', '20150715')

    insert into @WidgitStatus(WidgitStatusId, WidgitId, StatusCode, StatusDate) values(5, 2, 'low', '20151101')

    insert into @WidgitStatus(WidgitStatusId, WidgitId, StatusCode, StatusDate) values(6, 2, 'gone', '20151215')

    insert into @WidgitStatus(WidgitStatusId, WidgitId, StatusCode, StatusDate) values(7, 3, 'available', '20150301')

    ;with WidgitStatus_cte(WidgitId, LatestStatusDate)

    as (

    select WidgitId, max(StatusDate) as LatestStatusDate

    from @WidgitStatus

    group by WidgitId

    )

    --* Widgit status report, list each widgit along with current status (most recent status).

    select w.WidgitCode, w.WidgitDescription, ws.StatusDate, ws.StatusCode

    from @widgits w

    inner join @WidgitStatus ws on ws.WidgitId = w.WidgitId

    inner join WidgitStatus_cte wsc on w.WidgitId = wsc.WidgitId

    where ws.StatusDate = wsc.LatestStatusDate

    .

  • This is query is similar to what you have except that it has 1 less table scan because there is 1 less inner join, which made it about 45ish% faster. Of course, performance will depend greatly upon the indexes for these tables. Give it a shot and see if performance is boosted.

    ;with WidgitStatus_cte

    as (

    select WidgitId, StatusDate as LatestStatusDate, StatusCode,

    Row_Number() over(Partition by WidgitID order by StatusDate Desc) RowNum

    from @WidgitStatus

    )

    select w.WidgitCode, w.WidgitDescription, wsc.LatestStatusDate, wsc.StatusCode

    from @widgits w

    inner join WidgitStatus_cte wsc on w.WidgitId = wsc.WidgitId and wsc.RowNum = 1

    __________________________________________________________________________________________________________
    How to Post to get the most: http://www.sqlservercentral.com/articles/Best+Practices/61537/

  • Interesting solution LinksUp. Thank you. I'll try this in the real data and see how it goes. The real data is pretty large, eliminating a table scan should be pretty obvious improvement.

    .

  • I just wanted to add that, if you have any control of the schema (it sounds like you might not). Just changing the DDL might make a big difference. E.g. adding a primary key/clustered index, foreign key and some NOT NULLs. This example uses temp tables so I can include an FK constraint.

    USE tempdb

    GO

    IF OBJECT_ID('dbo.WidgitStatus') IS NOT NULL DROP TABLE dbo.WidgitStatus;

    IF OBJECT_ID('dbo.widgits') IS NOT NULL DROP TABLE dbo.widgits;

    CREATE TABLE dbo.widgits

    (

    WidgitId int primary key,

    WidgitCode varchar(4) NOT NULL,

    WidgitDescription varchar(30) NOT NULL

    )

    CREATE TABLE dbo.WidgitStatus

    (

    WidgitStatusId int,

    WidgitId int foreign key references dbo.widgits(widgitID),

    StatusCode varchar(10) NOT NULL,

    StatusDate datetime NOT NULL

    )

    If your tables are setup like this you can take advantage of a POC index which stands for Partition, Order, Cover. POC indexes are vital for using Window Functions as LinksUp did in his excellent solution.

    CREATE INDEX poc_WidgitStatus ON dbo.WidgitStatus(/*PARTITION*/WidgitID, /*ORDER*/StatusDate Desc) INCLUDE (/*Cover*/StatusCode);

    If you run both queries with Include Execution plan turned on, first without the POC index, then with it, you will see that both queries benefit from the POC index as the sort is removed from both query plans.

    If you have no control over the schema then, sometimes, it's worth doing a SELECT INTO a temp table with the correct constraints and indexes and using those to generate your final result set.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Thanks Alan.B. Completely understood. I believe the real schema has the proper keys/indexes, I was just looking for the best query solution for this scenario. I will double check though.

    .

  • You can also get this done by using CROSS APPLY.

    SELECT * FROM @widgits w CROSS APPLY (SELECT TOP 1 ws.StatusDate, ws.StatusCode

    FROM @WidgitStatus ws WHERE ws.WidgitId = w.WidgitId

    ORDER BY ws.StatusDate DESC) WS(StatusDate, StatusCode)

    [font="Calibri"]Raj[/font]

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply