An Alternative (Better?) Method to UNPIVOT (SQL Spackle)

    You can also do similar with SELECT/UNION ALL in 2005.

    --Jeff Moden

    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden (8/2/2012)

    You can also do similar with SELECT/UNION ALL in 2005.

    Basically, your "other Unpivot" technique is a brillant way to use the new (from 2008) Table Value Constructor

    Using the same method, from your first example you could replace :

    INSERT INTO #Orders


    By the following :


    (1, NULL, 3), (2, 5, 4), (1, 3, 10)

    I don't think it's a real gain but maybe it's more readable.

  • There is a way to get NULL values back in an unpivot. It's not pretty but it works.

    IF OBJECT_ID('tempdb..#Orders','U') IS NOT NULL

    DROP TABLE #Orders

    -- DDL and sample data for UNPIVOT Example 1

    CREATE TABLE #Orders

    (Orderid int identity, GiftCard int, TShirt int, Shipping int)

    INSERT INTO #Orders


    SELECT * FROM #Orders



    SET @Start = GETDATE()

    -- Traditional UNPIVOT

    SELECT OrderID, convert(varchar(15), ProductName) [ProductName], ProductQty

    FROM (

    SELECT OrderID, GiftCard, TShirt, Shipping

    FROM #Orders) p


    (ProductQty FOR ProductName IN ([GiftCard], [TShirt], [Shipping])) as unpvt

    SET @End = GETDATE()

    SELECT DATEDIFF(ms,@Start,@End) AS ElapsedTime

    -- Set a placeholder for NULL values.

    -- Let's assume, the integers are always greater than zero.

    -- if this is the case, we can use 0 to represent a NULL


    SET @NULL = 0

    SET @Start = GETDATE()

    SELECT OrderID, convert(varchar(15), ProductName) [ProductName], NULLIF(ProductQty,0) AS ProductQty

    FROM (

    SELECT OrderID

    , COALESCE(GiftCard,@NULL) AS GiftCard

    , COALESCE(TShirt,@NULL) AS TShirt

    , COALESCE(Shipping,@NULL) AS Shipping

    FROM #Orders) p


    (ProductQty FOR ProductName IN ([GiftCard], [TShirt], [Shipping])) as unpvt

    SET @End=GETDATE()

    SELECT DATEDIFF(ms,@Start,@End) AS ElapsedTime

    Like i said, it's not pretty but will return your NULL values and in this case is faster than the traditional method on my laptop.

  • Good to know ... I was unawares.

    Here's the last example done with pivot/unpivot ... a bit more code, performance not as good ... but much more maintainable/flexible (IMO) given the problem setup ... generalizes the computations and can adjust for the source month columns coming in and out of use. In such a real world situation Users would likely accept a performance goal of tolerable ...

    Those of you who continue to profess a belief in the Users will receive the standard substandard training which will result in your eventual elimination. Tron (1982)

    ;WITH [CTE_DETAIL]([ID],[ProductLine],[JanRev],[JanExp],[FebRev],[FebExp],[MarRev],[MarExp],[AprRev],[AprExp],[MayRev],[MayExp],[JunRev],[JunExp],[JulRev],[JulExp],[AugRev],[AugExp],[SepRev],[SepExp],[OctRev],[OctExp],[NovRev],[NovExp],[DecRev],[DecExp])



    SELECT [ID],


    --$0 AS


    --$0 AS


    --$0 AS


    --$0 AS


    --$0 AS


    --$0 AS


    $0 AS


    $0 AS


    $0 AS


    $0 AS


    $0 AS


    $0 AS


    $0 AS


    $0 AS


    $0 AS


    $0 AS


    $0 AS


    $0 AS


    $0 AS


    $0 AS


    $0 AS


    $0 AS



    $0 AS


    $0 AS


    FROM #ProfitLoss





    SELECT [ID],



    MONTH(LEFT([PERIOD_ITEM], 3) + '1, 2000') AS [iMONTH],



    FROM (

    SELECT [ID],




    [JanRev]-[JanExp] AS [JanMargin],



    [FebRev]-[FebExp] AS [FebMargin],



    [MarRev]-[MarExp] AS [MarMargin],



    [AprRev]-[AprExp] AS [AprMargin],



    [MayRev]-[MayExp] AS [MayMargin],



    [JunRev]-[JunExp] AS [JunMargin],



    [JulRev]-[JulExp] AS [JulMargin],



    [AugRev]-[AugExp] AS [AugMargin],



    [SepRev]-[SepExp] AS [SepMargin],



    [OctRev]-[OctExp] AS [OctMargin],



    [NovRev]-[NovExp] AS [NovMargin],



    [DecRev]-[DecExp] AS [DecMargin]




















































    [ $ O/(U) Prev Mo],

    [ % O/(U) Prev Mo]




    SELECT A.[ID],





    COALESCE(A.[AMOUNT] - B.[AMOUNT], $0) AS [ $ O/(U) Prev Mo],

    ROUND(100*COALESCE((A.[AMOUNT] - B.[AMOUNT])/NULLIF(B.[AMOUNT],$0),$0), 1) AS [ % O/(U) Prev Mo]



    ON A.[ID] = B.[ID]

    AND A.[sACCT] = B.[sACCT]

    AND A.[iMONTH] = B.[iMONTH] + 1


    SELECT [ID],



    MAX([Jan]) AS [Jan],

    MAX([Feb]) AS [Feb],

    MAX([Mar]) AS [Mar],

    MAX([Apr]) AS [Apr],

    MAX([May]) AS [May],

    MAX([Jun]) AS [Jun],

    MAX([Jul]) AS [Jul],

    MAX([Aug]) AS [Aug],

    MAX([Sep]) AS [Sep],

    MAX([Oct]) AS [Oct],

    MAX([Nov]) AS [Nov],

    MAX([Dec]) AS [Dec]

    FROM (

    SELECT [ID],



    CASE [sACCT]

    WHEN 'Rev' THEN 'Revenues'

    WHEN 'Exp' THEN 'Expenses'

    WHEN 'Margin' THEN 'Margin'


    END AS [LineItem],


    FROM (SELECT [ID], [ProductLine], [sMONTH], [sACCT], [AMOUNT], [ $ O/(U) Prev Mo], [ % O/(U) Prev Mo] FROM [CTE_DETAIL_CALC]) AS [UNPIVOTED] PIVOT

    (MAX([AMOUNT]) FOR [sMONTH] IN ([Jan],[Feb],[Mar],[Apr],[May],[Jun],[Jul],[Aug],[Sep],[Oct],[Nov],[Dec])) AS [PIVOTED_AMOUNT]


    SELECT [ID],



    ' $ O/(U) Prev Mo' AS [LineItem],


    FROM (SELECT [ID], [ProductLine], [sMONTH], [sACCT], [AMOUNT], [ $ O/(U) Prev Mo], [ % O/(U) Prev Mo] FROM [CTE_DETAIL_CALC]) AS [UNPIVOTED] PIVOT

    (MAX([ $ O/(U) Prev Mo]) FOR [sMONTH] IN ([Jan],[Feb],[Mar],[Apr],[May],[Jun],[Jul],[Aug],[Sep],[Oct],[Nov],[Dec])) AS [PIVOTED_AMOUNT]


    SELECT [ID],



    ' % O/(U) Prev Mo' AS [LineItem],


    FROM (SELECT [ID], [ProductLine], [sMONTH], [sACCT], [AMOUNT], [ $ O/(U) Prev Mo], [ % O/(U) Prev Mo] FROM [CTE_DETAIL_CALC]) AS [UNPIVOTED] PIVOT

    (MAX([ % O/(U) Prev Mo]) FOR [sMONTH] IN ([Jan],[Feb],[Mar],[Apr],[May],[Jun],[Jul],[Aug],[Sep],[Oct],[Nov],[Dec])) AS [PIVOTED_AMOUNT]

    ) A

    GROUP BY [ID],[ProductLine],[LineItem],[sACCT]




    CASE [sACCT]

    WHEN 'Rev' THEN 10

    WHEN 'Exp' THEN 20

    WHEN 'Mar' THEN 30

    ELSE 1000


    CASE [LineItem]

    WHEN 'Revenues' THEN 10

    WHEN 'Expenses' THEN 10

    WHEN 'Margin' THEN 10

    WHEN ' $ O/(U) Prev Mo' THEN 20

    WHEN ' % O/(U) Prev Mo' THEN 30

    ELSE 1000


  • Before I get into thanking everybody for their comments, last weekend I looked into the performance gains achieved by the "other UNPIVOT" approach because I believe there's no such thing as a free lunch. I got some interesting results that I thought worthy of an update.

    I reran the single UNPIVOT example with larger record sets and averaged results over 5 runs appear below.

    I have highlighted in red the slower of the two approaches in case. In this test harness, I dumped the display results to local variables to suppress them.

    From the results, we see the CROSS APPLY VALUES consistently beats the elapsed time of UNPIVOT up through 5,000,000 rows. Two specific points though:

    1. There seems to be a breakeven point where CPU time for CAV begins to exceed that of UNPIVOT (at perhaps 800K rows).

    2. The CAV results that show faster elapsed times than CPU (starting at around 500K rows) seemed odd to me, until I realized that I'm running these tests on a Core i5 processor machine, so this would imply that SQL is parallelizing the query. I tested this theory by adding OPTION (MAXDOP 1) and found that CPU/Elapsed time results were more normal again (i.e., elapsed time slightly greater than CPU time).

    However I got these results at 5,000,000 rows (single run):

    (5000000 row(s) affected)

    SQL Server Execution Times:

    CPU time = 4072 ms, elapsed time = 4304 ms.

    SQL Server Execution Times:

    CPU time = 2652 ms, elapsed time = 2688 ms.

    So in other words, for CAV CPU time dropped from about 6989ms (this was the 75% sparseness case) to 2652ms. I am not sure why UNPIVOT was also reduced (4539ms to 4072ms) but obviously its a lesser drop. In any event, now CAV is beating UNPIVOT in CPU as well and also runs in lesser elapsed time! Seems to be a double benefit.

    While this does seem to confirm my theory about parallelism, I'm not sure exactly what to make of the overall drop in CPU usage. Perhaps with lower parallelism the CPU needs to do less work?

    As usual, the answer to "which approach should I use in my query?" is "it depends!"

