SORT cost in union query

  • I have a query i have been optimizing. Now runs in about 15 minutes but was wondering if there is any way tr educe the SORT cost.

    Currently the high costs left are the Table insert which is 58% and the Sort cost of 36%

    The inner query below is around 400million rows and aggregates to around 15,000,000 rows - all feed back welcome πŸ™‚

    SELECT @1 = DateKey FROM dbo.DimDate WHERE TheDate = CAST(DATEADD(WEEK, -1, GETDATE() -1) as DATE)

    SELECT @2 = DateKey FROM dbo.DimDate WHERE TheDate = CAST(DATEADD(WEEK, -2, GETDATE() -1) as DATE)

    SELECT @3 = DateKey FROM dbo.DimDate WHERE TheDate = CAST(DATEADD(WEEK, -3, GETDATE() -1) as DATE)

    SELECT @4 = DateKey FROM dbo.DimDate WHERE TheDate = CAST(DATEADD(WEEK, -4, GETDATE() -1) as DATE)

    SELECT @5 = DateKey FROM dbo.DimDate WHERE TheDate = CAST(DATEADD(WEEK, -5, GETDATE() -1) as DATE)

    SELECT @6 = DateKey FROM dbo.DimDate WHERE TheDate = CAST(DATEADD(WEEK, -6, GETDATE() -1) as DATE)

    SELECT @7 = DateKey FROM dbo.DimDate WHERE TheDate = CAST(DATEADD(WEEK, -7, GETDATE() -1) as DATE)

    SELECT @8 = DateKey FROM dbo.DimDate WHERE TheDate = CAST(DATEADD(WEEK, -8, GETDATE() -1) as DATE)

    INSERT dbo.MyTable(CustomerKey, StoreKey, LastWeek, LastTwoWeeks, LastThreeWeeks, LastFourWeeks, LastFiveWeeks, LastSixWeeks, LastSevenWeeks, LastEightWeeks)

    SELECT CustomerKey

    , StoreKey

    , SUM(One) AS LastWeek

    , SUM(Two) AS LastTwoWeeks

    , SUM(Three) AS LastThreeWeeks

    , SUM(Four) AS LastFourWeeks

    , SUM(Five) AS LastFiveWeeks

    , SUM(Six) AS LastSixWeeks

    , SUM(Seven) AS LastSevenWeeks

    , SUM(Eight) AS LastEightWeeks

    FROM (

    SELECT CustomerKey

    , StoreKey

    , CASE WHEN StorePurchaseDate > @1 THEN 1 END as One

    , CASE WHEN StorePurchaseDate > @2 THEN 1 END as Two

    , CASE WHEN StorePurchaseDate > @3 THEN 1 END as Three

    , CASE WHEN StorePurchaseDate > @4 THEN 1 END as Four

    , CASE WHEN StorePurchaseDate > @5 THEN 1 END as Five

    , CASE WHEN StorePurchaseDate > @6 THEN 1 END as Six

    , CASE WHEN StorePurchaseDate > @7 THEN 1 END as Seven

    , CASE WHEN StorePurchaseDate > @8 THEN 1 END as Eight

    FROM dbo.FactStore

    UNION ALL

    SELECT CustomerKey

    , StoreKey

    , CASE WHEN OnlinePurchaseDate > @1 THEN 1 END as One

    , CASE WHEN OnlinePurchaseDate > @2 THEN 1 END as Two

    , CASE WHEN OnlinePurchaseDate > @3 THEN 1 END as Three

    , CASE WHEN OnlinePurchaseDate > @4 THEN 1 END as Four

    , CASE WHEN OnlinePurchaseDate > @5 THEN 1 END as Five

    , CASE WHEN OnlinePurchaseDate > @6 THEN 1 END as Six

    , CASE WHEN OnlinePurchaseDate > @7 THEN 1 END as Seven

    , CASE WHEN OnlinePurchaseDate > @8 THEN 1 END as Eight

    FROM dbo.FactOnline

    ) as x

    GROUP BY CustomerKey

    , StoreKey;

    GO

  • There doesn't seem to be any tuning opportunities in the code itself. You're retrieving all the data from two tables in a UNION ALL with no filtering. The optimizer is going to have very few choices in how to deal with that. Then you have the aggregation part of the query. The only way to improve this, based on the limited knowledge I have, would be to restructure your tables to avoid the UNION, possibly, and add a columnstore index to the restructured table for the aggregation. But as long as you don't have any filtering in place, there just isn't much that can be done here.

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

  • Thanks for the reply.

    the table will be dropped and recreated form scratch once a week our of BAU and the used only to be absorbed into QlikView.

    There is some conditions

    SELECT CustomerKey

    , StoreKey

    , CASE WHEN StorePurchaseDate > @1 THEN 1 END as One

    , CASE WHEN StorePurchaseDate > @2 THEN 1 END as Two

    , CASE WHEN StorePurchaseDate > @3 THEN 1 END as Three

    , CASE WHEN StorePurchaseDate > @4 THEN 1 END as Four

    , CASE WHEN StorePurchaseDate > @5 THEN 1 END as Five

    , CASE WHEN StorePurchaseDate > @6 THEN 1 END as Six

    , CASE WHEN StorePurchaseDate > @7 THEN 1 END as Seven

    , CASE WHEN StorePurchaseDate > @8 THEN 1 END as Eight

    FROM dbo.FactStore

    WHERE IsReturned = 'N'

    UNION ALL

    SELECT CustomerKey

    , StoreKey

    , CASE WHEN OnlinePurchaseDate > @1 THEN 1 END as One

    , CASE WHEN OnlinePurchaseDate > @2 THEN 1 END as Two

    , CASE WHEN OnlinePurchaseDate > @3 THEN 1 END as Three

    , CASE WHEN OnlinePurchaseDate > @4 THEN 1 END as Four

    , CASE WHEN OnlinePurchaseDate > @5 THEN 1 END as Five

    , CASE WHEN OnlinePurchaseDate > @6 THEN 1 END as Six

    , CASE WHEN OnlinePurchaseDate > @7 THEN 1 END as Seven

    , CASE WHEN OnlinePurchaseDate > @8 THEN 1 END as Eight

    FROM dbo.FactOnline

    didn't think the where clause for the first part of the union all was relevant, sorry for missing it out.

    thnaks

  • Have you tried using appropriate date filters on each table?

    Something like StorePurchaseDate >= @8.

    Posting the actual execution plan of the query with the table insert part commented out might be useful.

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Lots of possible issues here:

    1) The SORT isn't because of UNION. You are doing a UNION ALL, which does NOT do the SORT/DISTINCT that a UNION alone does. The SORT is for the GROUP BY. That cannot be avoided unless you make some form of index that can be used to avoid it.

    2) My belief it is is something related to the INSERT itself OR possibly a spill to tempdb for the GROUP BY SORT. Look at query plan details to find out about the spill. Monitor wait stats and especially file IO stalls during the run to find the pain points. Also use sp_whoisactive to see real-time performance issues.

    3) How much memory does server have? How many cores? What is MAXDOP setting and are that many cores being used? What is IO subsystem picture?

    4) You say this is a complete rebuild process? Now another main issue becomes are you getting minimally logged INSERTs? Database recovery model, TABLOCKX, indexing on table being inserted to, etc. all come into play for that.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Some filtering then, but, assuming that's a CHAR(1) field, not much filtering at all, so you're still going to be looking at scans and few options for the optimizer to help you out. Kevin is probably on the right track, with no changes to processing or structure, you're down to the hardware to carry the load.

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

  • thanks, there is a message about spilling into tempdb. This is currently being run on a dev server though which is shared and much different from the production one for the client.

  • There is actually an optimization that can be done to this code. It's called "pre-aggregation" (a term that I go from SQL Server MVP Peter Larsson). If you were to do SUM(CASE) in each of the queries in the FROM clause, less time would be spent in the UNION ALL and a couple of other things. I've used the method quite successfully in other things of this nature.

    And, yeah... it seems counter intuitive but give it a try. Don't forget that missing WHERE clause in that one query. πŸ™‚

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Thanks for that, the sort cost increases when doing the pre-aggregation, however the overall run time does down as well estimate IO etc

  • When you break this down you are creating weekly bandings, the following query reduces the need for the parameters, as this could form either a CTE, Subquery or Table valued function, with a parameter for number of weeks.

    SELECT

    DateKey

    ,(DATEPART(WEEK,CONVERT(DATE,GetDATE()))-DatePart(WEEK,TheDate))+1 Band

    FROM

    DimDate

    where

    TheDate

    Between CONVERT(DATE,DATEAdd(week,-7,GetDATE()))

    AND GetDATE()-1

    This would then form part of a join on the Store/Online PurchaseDate Key, to limit the rows, and in that case an index on that column would possibly be beneficial.

    To replicate the Logic replacing the

    CASE WHEN StorePurchaseDate > @1 THEN 1 END AS ONE

    With

    SUM (CASE Band WHEN 1 THEN 1 ELSE 0 END) AS ONE

    ,SUM (CASE Band WHEN 2 THEN 1 ELSE 0 END) AS TWO

    should replicate it.

    I'm not certain what it is you are counting, but there a possible issue with the numbers, as the current logic counts number of items, assuming each row is a single sale product, what if a customer buys 10 items in one transaction, is that 10 rows or a single row?

    _________________________________________________________________________
    SSC Guide to Posting and Best Practices

  • Try this:

    SELECT COALESCE(fs.CustomerKey, fo.CustomerKey) AS CustomerKey

    , COALESCE(fs.StoreKey, fo.StoreKey) AS StoreKey

    , ISNULL(fs.LastWeek, 0) + ISNULL(fo.LastWeek, 0) AS LastWeek

    , ISNULL(fs.LastTwoWeeks, 0) + ISNULL(fo.LastTwoWeeks, 0) AS LastTwoWeeks

    , ISNULL(fs.LastThreeWeeks, 0) + ISNULL(fo.LastThreeWeeks, 0) AS LastThreeWeeks

    , ISNULL(fs.LastFourWeeks, 0) + ISNULL(fo.LastFourWeeks, 0) AS LastFourWeeks

    , ISNULL(fs.LastFiveWeeks, 0) + ISNULL(fo.LastFiveWeeks, 0) AS LastFiveWeeks

    , ISNULL(fs.LastSixWeeks, 0) + ISNULL(fo.LastSixWeeks, 0) AS LastSixWeeks

    , ISNULL(fs.LastSevenWeeks, 0) + ISNULL(fo.LastSevenWeeks, 0) AS LastSevenWeeks

    , ISNULL(fs.LastEightWeeks, 0) + ISNULL(fo.LastEightWeeks, 0) AS LastEightWeeks

    FROM (

    SELECT CustomerKey

    , StoreKey

    , SUM(CASE WHEN StorePurchaseDate > @1 THEN 1 ELSE 0 END) as LastWeek

    , SUM(CASE WHEN StorePurchaseDate > @2 THEN 1 ELSE 0 END) as LastTwoWeeks

    , SUM(CASE WHEN StorePurchaseDate > @3 THEN 1 ELSE 0 END) as LastThreeWeeks

    , SUM(CASE WHEN StorePurchaseDate > @4 THEN 1 ELSE 0 END) as LastFourWeeks

    , SUM(CASE WHEN StorePurchaseDate > @5 THEN 1 ELSE 0 END) as LastFiveWeeks

    , SUM(CASE WHEN StorePurchaseDate > @6 THEN 1 ELSE 0 END) as LastSixWeeks

    , SUM(CASE WHEN StorePurchaseDate > @7 THEN 1 ELSE 0 END) as LastSevenWeeks

    , SUM(CASE WHEN StorePurchaseDate > @8 THEN 1 ELSE 0 END) as LastEightWeeks

    FROM dbo.FactStore

    WHERE

    StorePurchaseDate > @8

    GROUP BY CustomerKey

    , StoreKey

    ) AS fs

    FULL OUTER JOIN (

    SELECT CustomerKey

    , StoreKey

    , SUM(CASE WHEN OnlinePurchaseDate > @1 THEN 1 ELSE 0 END) as LastWeek

    , SUM(CASE WHEN OnlinePurchaseDate > @2 THEN 1 ELSE 0 END) as LastTwoWeeks

    , SUM(CASE WHEN OnlinePurchaseDate > @3 THEN 1 ELSE 0 END) as LastThreeWeeks

    , SUM(CASE WHEN OnlinePurchaseDate > @4 THEN 1 ELSE 0 END) as LastFourWeeks

    , SUM(CASE WHEN OnlinePurchaseDate > @5 THEN 1 ELSE 0 END) as LastFiveWeeks

    , SUM(CASE WHEN OnlinePurchaseDate > @6 THEN 1 ELSE 0 END) as LastSixWeeks

    , SUM(CASE WHEN OnlinePurchaseDate > @7 THEN 1 ELSE 0 END) as LastSevenWeeks

    , SUM(CASE WHEN OnlinePurchaseDate > @8 THEN 1 ELSE 0 END) as LastEightWeeks

    FROM dbo.FactOnline

    WHERE

    OnlinePurchaseDate > @8

    GROUP BY CustomerKey

    , StoreKey

    ) as fo ON fo.CustomerKey = fs.CustomerKey AND fo.StoreKey = fs.StoreKey

    GROUP BY COALESCE(fs.CustomerKey, fo.CustomerKey)

    , COALESCE(fs.StoreKey, fo.StoreKey)

    SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

  • Quick question if StorePurchaseDate is not null is the OnlinePurchaseDate null and vice versa? Or do you have an OnlinePurchase flag?

    😎

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply