Multi Column Sort

  • Christian Buettner (4/7/2009)


    Hi Chris,

    Very fancy stuff that you posted. Wasn't even aware you could do such fancy stuff in an ORDER BY:-).

    Actually I did want to tell you that your queries did not work, but then I realized that mine did not either. Reason: The data in table #test is flawed.

    SELECT A.id, b.id, A.col2, B.col2, a.col3, B.col3 FROM #test A, #test b

    WHERE A.col2 > b.col2 AND a.col3 < B.col3

    No matter how you sort, either col2 or col3 will remain unsorted (could apply to other combinations as well).

    Neither was I mate until I saw something really wacky in an order by here on the forum some months ago!

    Here's a data set which passes your test:

    USE Utilities

    GO

    DROP TABLE #qtest

    DROP TABLE #test

    CREATE TABLE #qtest (id int, col1 numeric(6,2), col2 numeric(6,2), col3 numeric(6,2), col4 numeric(6,2), col5 numeric(6,2), col6 numeric(6,2), col7 numeric(6,2), col8 numeric(6,2))

    INSERT INTO #qtest (id, col1, col2, col3, col4, col5, col6, col7, col8)

    SELECT 1, 2.35, 3.01, 3.49, 4.25, 4.79, 5.36, 5.82, 6.31

    UNION SELECT 2, null, 3.59, 4.32, null, 5.21, null, null, 7.12

    UNION SELECT 3, NULL, null, null, 5.23, 5.41, 6.33, 6.89, 7.99

    UNION SELECT 4, null, 5.46, 5.31, 5.92, null, 6.87, 7.34, null

    UNION SELECT 5, 5.12, 5.66, 6.31, 6.92, null, 7.87, 8.34, null

    --- 25,000 rows

    SELECT ROW_NUMBER() OVER (ORDER BY Batch, id) AS [id],

    col1, col2, col3, col4, col5, col6, col7, col8

    INTO #test

    FROM (SELECT CAST(1 AS INT) AS [Batch], *

    FROM #qtest

    UNION ALL

    SELECT 1+(number/4), id, col1+number, col2+number, col3+number, col4+number, col5+number, col6+number, col7+number, col8+number

    FROM #qtest, Utilities.dbo.Numbers

    WHERE number%4 = 0 AND number <= (20000-4) -- 25,000

    ) d

    /****** Object: Index [seq] Script Date: 04/04/2009 14:02:11 ******/

    CREATE CLUSTERED INDEX [seqm] ON #test

    ( [col1] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    GO

    SELECT A.id, b.id, A.col2, B.col2, a.col3, B.col3 FROM #test A, #test b

    WHERE A.col2 > b.col2 AND a.col3 < B.col3

    I'll run everything through later this morning.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • You're right. I was in a rush yesterday, and forgot to take into account the change from null to zero would affect the join math.

    So far as I can tell, this version gives correct sorting results:

    update #test

    set col1 = isnull(col1,0),

    col2 = isnull(col2,0),

    col3 = isnull(col3,0),

    col4 = isnull(col4,0),

    col5 = isnull(col5,0),

    col6 = isnull(col6,0),

    col7 = isnull(col7,0),

    col8 = isnull(col8,0)

    ;with CTE (CTEID, Seq1) as

    (select t1.id, count(*)

    from #test t1

    inner join #test t2

    on t1.col1 >= t2.col1

    and t1.col2 >= t2.col2

    and t1.col3 >= t2.col3

    and t1.col4 >= t2.col4

    and t1.col5 >= t2.col5

    and t1.col6 >= t2.col6

    and t1.col7 >= t2.col7

    and t1.col8 >= t2.col8

    group by t1.id)

    select #test.*

    from #test

    left outer join CTE

    on id = cteid

    order by seq1;

    On 39968 rows (generated using 10k rows from the Numbers table), it took 4:27 to finish on my machine.

    The original spec doesn't care about the null vs zero thing. As need be, you could add an insert into a temp table, do the update to zero on that, sort, then return the final result with a select that either uses the original data or a NullIf on the temp table. You'll get the right output with any of those.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Looks like it needs a little more tweaking - first column is correct sequential order:

    1 2.35 3.01 3.49 4.25 4.79 5.36 5.82 6.31

    2 0.00 3.59 4.32 0.00 5.21 0.00 0.00 7.12

    3 0.00 0.00 0.00 5.23 5.41 6.33 6.89 7.99

    4 0.00 5.46 5.31 5.92 0.00 6.87 7.34 0.00

    9 0.00 9.46 9.31 9.92 0.00 10.87 11.34 0.00

    5 5.12 5.66 6.31 6.92 0.00 7.87 8.34 0.00

    7 0.00 7.59 8.32 0.00 9.21 0.00 0.00 11.12

    8 0.00 0.00 0.00 9.23 9.41 10.33 10.89 11.99

    13 0.00 0.00 0.00 13.23 13.41 14.33 14.89 15.99

    It's mighty fast though, a little over one minute on this lappy.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • On the same dataset, without updating to zero, the complex Case statement took 8:57 to run.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • GSquared (4/7/2009)


    On the same dataset, without updating to zero, the complex Case statement took 8:57 to run.

    I'm missing something and can't figure out what it is! To simplify, here's a dataset generator for 15 rows:

    USE Utilities

    GO

    DROP TABLE #qtest

    DROP TABLE #test

    CREATE TABLE #qtest (id int, col1 numeric(6,2), col2 numeric(6,2), col3 numeric(6,2), col4 numeric(6,2), col5 numeric(6,2), col6 numeric(6,2), col7 numeric(6,2), col8 numeric(6,2))

    INSERT INTO #qtest (id, col1, col2, col3, col4, col5, col6, col7, col8)

    SELECT 1, 2.35, 3.01, 3.49, 4.25, 4.79, 5.36, 5.82, 6.31

    UNION SELECT 2, null, 3.59, 4.32, null, 5.21, null, null, 7.12

    UNION SELECT 3, NULL, null, null, 5.23, 5.41, 6.33, 6.89, 7.99

    UNION SELECT 4, null, 5.31, 5.46, 5.92, null, 6.87, 7.34, null

    UNION SELECT 5, 5.12, 5.66, 6.31, 6.92, null, 7.87, 8.34, null

    --- 25,000 rows

    SELECT ROW_NUMBER() OVER (ORDER BY Batch, id) AS [id],

    col1, col2, col3, col4, col5, col6, col7, col8

    INTO #test

    FROM (SELECT CAST(1 AS INT) AS [Batch], *

    FROM #qtest

    UNION ALL

    SELECT 1+(number/4), id, col1+number, col2+number, col3+number, col4+number, col5+number, col6+number, col7+number, col8+number

    FROM #qtest, Utilities.dbo.Numbers

    WHERE number%4 = 0 AND number < 12 --<= (20000-4) -- 25,000

    ) d

    /****** Object: Index [seq] Script Date: 04/04/2009 14:02:11 ******/

    CREATE CLUSTERED INDEX [seqm] ON #test

    ( [col1] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    GO

    Here's the data it generates, which looks ok by eye:

    1 2.35 3.01 3.49 4.25 4.79 5.36 5.82 6.31

    2 0.00 3.59 4.32 0.00 5.21 0.00 0.00 7.12

    3 0.00 0.00 0.00 5.23 5.41 6.33 6.89 7.99

    4 0.00 5.31 5.46 5.92 0.00 6.87 7.34 0.00

    5 5.12 5.66 6.31 6.92 0.00 7.87 8.34 0.00

    6 6.35 7.01 7.49 8.25 8.79 9.36 9.82 10.31

    7 0.00 7.59 8.32 0.00 9.21 0.00 0.00 11.12

    8 0.00 0.00 0.00 9.23 9.41 10.33 10.89 11.99

    9 0.00 9.31 9.46 9.92 0.00 10.87 11.34 0.00

    10 9.12 9.66 10.31 10.92 0.00 11.87 12.34 0.00

    11 10.35 11.01 11.49 12.25 12.79 13.36 13.82 14.31

    12 0.00 11.59 12.32 0.00 13.21 0.00 0.00 15.12

    13 0.00 0.00 0.00 13.23 13.41 14.33 14.89 15.99

    14 0.00 13.31 13.46 13.92 0.00 14.87 15.34 0.00

    15 13.12 13.66 14.31 14.92 0.00 15.87 16.34 0.00

    Here's the result of GSquared's query:

    1 2.35 3.01 3.49 4.25 4.79 5.36 5.82 6.31

    2 0.00 3.59 4.32 0.00 5.21 0.00 0.00 7.12

    3 0.00 0.00 0.00 5.23 5.41 6.33 6.89 7.99

    4 0.00 5.31 5.46 5.92 0.00 6.87 7.34 0.00

    5 5.12 5.66 6.31 6.92 0.00 7.87 8.34 0.00

    7 0.00 7.59 8.32 0.00 9.21 0.00 0.00 11.12

    8 0.00 0.00 0.00 9.23 9.41 10.33 10.89 11.99

    9 0.00 9.31 9.46 9.92 0.00 10.87 11.34 0.00

    12 0.00 11.59 12.32 0.00 13.21 0.00 0.00 15.12

    13 0.00 0.00 0.00 13.23 13.41 14.33 14.89 15.99

    14 0.00 13.31 13.46 13.92 0.00 14.87 15.34 0.00

    10 9.12 9.66 10.31 10.92 0.00 11.87 12.34 0.00

    6 6.35 7.01 7.49 8.25 8.79 9.36 9.82 10.31

    15 13.12 13.66 14.31 14.92 0.00 15.87 16.34 0.00

    11 10.35 11.01 11.49 12.25 12.79 13.36 13.82 14.31

    :ermm:

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • No, you're not missing anything. That solution just plain doesn't work. I missed a bit when looking at it. That's what I get for rushing.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Guys, It looks like an interesting problem for me and I would like to try my hands on it. But the problem is that I still not completely understood the requirement. What I understood is that the OP wants an ordering of columns from top to bottom moving from column 1 to column 8. Am I correct??

    And I am assuming that column values in a row is incremental from left to right??

    --Ramesh


  • I am going in circles in circles in this thread...:blink:

  • Ramesh (4/7/2009)


    Guys, It looks like an interesting problem for me and I would like to try my hands on it. But the problem is that I still not completely understood the requirement. What I understood is that the OP wants an ordering of columns from top to bottom moving from column 1 to column 8. Am I correct??

    And I am assuming that column values in a row is incremental from left to right??

    That sounds about right, Ramesh. The sample data I posted earlier today will give you a visual.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Chris Morris (4/7/2009)


    Ramesh (4/7/2009)


    Guys, It looks like an interesting problem for me and I would like to try my hands on it. But the problem is that I still not completely understood the requirement. What I understood is that the OP wants an ordering of columns from top to bottom moving from column 1 to column 8. Am I correct??

    And I am assuming that column values in a row is incremental from left to right??

    That sounds about right, Ramesh. The sample data I posted earlier today will give you a visual.

    Thanks Chris for the sample data.

    I've understood the complexity in it and now its my turn to post my first solution (and hopefully the last as well). It took me an hour and a hell lot of brain hammering to product this code.

    IF ( OBJECT_ID( 'tempdb..#test' ) IS NOT NULL )

    DROP TABLE #test

    CREATE TABLE #test

    (

    id INT NOT NULL PRIMARY KEY CLUSTERED,

    col1 NUMERIC(18,2) NULL,

    col2 NUMERIC(18,2) NULL,

    col3 NUMERIC(18,2) NULL,

    col4 NUMERIC(18,2) NULL,

    col5 NUMERIC(18,2) NULL,

    col6 NUMERIC(18,2) NULL,

    col7 NUMERIC(18,2) NULL,

    col8 NUMERIC(18,2) NULL

    )

    ; WITH qtest

    AS

    (

    SELECT 1 AS id, 2.35 AS col1, 3.01 AS col2, 3.49 AS col3, 4.25 AS col4, 4.79 AS col5, 5.36 AS col6, 5.82 AS col7, 6.31 AS col8

    UNION ALL

    SELECT 2, NULL, 3.59, 4.32, NULL, 5.21, NULL, NULL, 7.12

    UNION ALL

    SELECT 3, NULL, NULL, NULL, 5.23, 5.41, 6.33, 6.89, 7.99

    UNION ALL

    SELECT 4, NULL, 5.31, 5.46, 5.92, NULL, 6.87, 7.34, NULL

    UNION ALL

    SELECT 5, 5.12, 5.66, 6.31, 6.92, NULL, 7.87, 8.34, NULL

    )

    INSERT #test( id, col1, col2, col3, col4, col5, col6, col7, col8 )

    SELECT ROW_NUMBER() OVER( ORDER BY Batch, id ) AS [id], col1, col2, col3, col4, col5, col6, col7, col8

    FROM (

    SELECT CAST( 1 AS INT ) AS [Batch], *

    FROM qtest

    UNION ALL

    SELECT 1 + ( Number / 4 ), id, col1 + Number, col2 + Number, col3 + Number, col4 + Number,

    col5 + Number, col6 + Number, col7 + Number, col8 + Number

    FROM qtest, dbo.Numbers

    WHERE Number % 4 = 0 AND Number < 12 ---<= (20000-4) -- 25,000

    ) d

    -- Expected order of output

    SELECT * FROM #test ORDER BY id

    -- Unpivot the columns into rows, so that each id will have value of each column in terms of rows

    -- note the unpivot operation skips the null valued columns from the resultset

    ; WITH UnpivotTest

    AS

    (

    SELECT *

    FROM #test t

    UNPIVOT

    (

    OrderVal FOR OrderCol IN( [col1], [col2], [col3], [col4], [col5], [col6], [col7], [col8] )

    ) P

    ),

    -- set the row number of each row partition by each column, this will get you the position of an item (id)

    -- in each column, after that we'll get these values for the first 6 rows

    -- id, col1, col2, col3, col4

    -- 1, 1, 1, 1, 1

    -- 2, -, 2, 2, -

    -- 3, -, -, -, 2

    -- 4, -, 3, 3, 3

    -- 5, 2, 4, 4, 4

    -- 6, 3, 5, 5, 5

    -- After that we are taking the max. of each id and count for each id

    -- and ordering the final resultset by id and then by count

    VerticalOrderedTest

    AS

    (

    SELECT id, MAX( rn ) AS rn, COUNT( * ) AS cnt

    FROM (

    SELECT ROW_NUMBER() OVER( PARTITION BY OrderCol ORDER BY ISNULL( NULLIF( OrderVal, 0 ), 9999 ) ) AS rn, id

    FROM UnpivotTest

    ) t

    GROUP BY id

    )

    SELECT v.id, t.col1, t.col2, t.col3, t.col4, t.col5, t.col6, t.col7, t.col8

    FROM VerticalOrderedTest v

    INNER JOIN #test t ON v.id = t.id

    ORDER BY v.rn, v.cnt

    --Ramesh


  • Can I play too?

    DECLARE @Sample TABLE

    (

    id int primary key clustered,

    col1 numeric(6, 2),

    col2 numeric(6, 2),

    col3 numeric(6, 2),

    col4 numeric(6, 2),

    col5 numeric(6,2),

    col6 numeric(6,2),

    col7 numeric(6,2),

    col8 numeric(6,2)

    )

    INSERT @Sample

    SELECT 1, 2.35, 3.01, 3.49, 4.25, 4.79, 5.36, 5.82, 6.31 UNION ALL

    SELECT 2, NULL, 3.59, 4.32, NULL, 5.21, NULL, NULL, 7.12 UNION ALL

    SELECT 3, NULL, NULL, NULL, 5.23, 5.41, 6.33, 6.89, 7.99 UNION ALL

    SELECT 4, NULL, 5.31, 5.46, 5.92, NULL, 6.87, 7.34, NULL UNION ALL

    SELECT 5, 5.12, 5.66, 6.31, 6.92, NULL, 7.87, 8.34, NULL

    -- Before

    SELECT *

    FROM @Sample

    -- After

    SELECT p.ID,

    s.col1,

    s.col2,

    s.col3,

    s.col4,

    s.col5,

    s.col6,

    s.col7,

    s.col8

    FROM (

    SELECT u.ID,

    u.theValue,

    ROW_NUMBER() OVER (PARTITION BY u.ID ORDER BY u.theValue) AS recID

    FROM @Sample AS s

    UNPIVOT (

    theValue

    FOR theCol IN (s.col1, s.col2, s.col3, s.col4, s.col5, s.col6, s.col7, s.col8)

    ) AS u

    ) AS d

    PIVOT (

    MAX(d.theValue)

    FOR d.recID IN ([1], [2], [3], [4], [5], [6], [7], [8])

    ) AS p

    INNER JOIN @Sample AS s ON s.ID = p.ID

    ORDER BY p.[1],

    p.[2],

    p.[3],

    p.[4],

    p.[5],

    p.[6],

    p.[7],

    p.[8]

    EDIT: Added SQL-script as attachment.


    N 56°04'39.16"
    E 12°55'05.25"

  • Very elegant! Makes my CASEy stuff look like a dustbin full of spaghetti!

    Who's gonna test with 40,000 rows then? 😀

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Peso,

    I think the results of your query seems to be incorrect. Here are the results...

    -- Expected Results

    id col1 col2 col3 col4 col5 col6 col7 col8

    1 2.35 3.01 3.49 4.25 4.79 5.36 5.82 6.31

    2 NULL 3.59 4.32 NULL 5.21 NULL NULL 7.12

    3 NULL NULL NULL 5.23 5.41 6.33 6.89 7.99

    4 NULL 5.31 5.46 5.92 NULL 6.87 7.34 NULL

    5 5.12 5.66 6.31 6.92 NULL 7.87 8.34 NULL

    -- Actual Results

    ID col1 col2 col3 col4 col5 col6 col7 col8

    1 2.35 3.01 3.49 4.25 4.79 5.36 5.82 6.31

    2 NULL 3.59 4.32 NULL 5.21 NULL NULL 7.12

    5 5.12 5.66 6.31 6.92 NULL 7.87 8.34 NULL

    3 NULL NULL NULL 5.23 5.41 6.33 6.89 7.99

    4 NULL 5.31 5.46 5.92 NULL 6.87 7.34 NULL

    --Ramesh


  • Chris Morris (4/8/2009)


    Very elegant! Makes my CASEy stuff look like a dustbin full of spaghetti!

    Who's gonna test with 40,000 rows then? 😀

    Fortunately, I had enough time to test for 5000 rows, but not enough patience:w00t: to test for 40,000 rows.

    Here are the results

    --Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    --Table '#test'. Scan count 2, logical reads 136, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    --Ramesh' Version : 876 ms

    --Warning: Null value is eliminated by an aggregate or other SET operation.

    --Table '#test'. Scan count 3752, logical reads 255136, physical reads 0, read-ahead reads 10, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    --ChrisM' Version 3 ("condensed" version 2) : 36376 ms

    Test code attached...

    --Ramesh


  • Using Ramesh's test cases, I get these timings:

    Ramesh' Version : 526 ms

    ChrisM' Version 3 ("condensed" version 2) : 23480 ms

    Peso : 236 ms


    N 56°04'39.16"
    E 12°55'05.25"

Viewing 15 posts - 91 through 105 (of 180 total)

You must be logged in to reply to this topic. Login to reply