Multi Column Sort

  • john.arnott (4/3/2009)


    Chris Morris (4/3/2009)


    GSquared (4/3/2009)


    Anyone else have any ideas on this?

    There were two other ideas which I've not had time to follow up yet. The first was to derive an average slope for each row, as value over column number, then calculate the missing col1 value from any values on the same row to permit ordering by col1 alone.......

    Clever, Chris. You couldn't use col1 alone, actually. For collisions on col1, you'd want to use the slope as a secondary order key. Only thing left, perhaps, would be to examine the standard deviation of the domain values as a quantifiable indication of confidence in the resulting row order.

    You're right John, thanks for pointing this out. I went for the easier option 😉

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Assuming that the zero values really are nulls in the database then I suggest the following:-

    select

    coalesce( col1, 0 ) col1,

    coalesce( col2, 0 ) col2,

    coalesce( col3, 0 ) col3

    from data

    order by

    coalesce( col1, col2, col3, 0 ),

    coalesce( col2, col3, 0),

    coalesce( col3, 0 )

  • Chris, on the case-statement version, running on the same hardware as the prior tests, took 8:59 (nearly 9 minutes) the first run, 8:54 the second.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • GSquared (4/6/2009)


    Chris, on the case-statement version, running on the same hardware as the prior tests, took 8:59 (nearly 9 minutes) the first run, 8:54 the second.

    If that was exactly as posted, then it was 39,968 rows - it takes five or six minutes on this lappy.

    (39968 row(s) affected) / 00:05:07 (00:06:30 without clustered index)

    Changing the upper bound of the tally number filter back to your original value will do the trick:

    where number between 10 and 1000

    The whole batch including creating the sample data runs in 00:00:04 on this lappy. I've got another method which runs slightly slower than this but it's even more ugly, man 😉

    If I get time later I'll run Chris B's, yours and my solutions on here for comparison. I'm sure there's a fourth solution out there which would combine the elegance of yours and Chris B's solutions with the benefit of a little more performance.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Sweet. This has certainly been an interesting problem to follow.

    Anyone heard from the OP lately?

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Bob Hovious (4/6/2009)


    Sweet. This has certainly been an interesting problem to follow.

    Anyone heard from the OP lately?

    Nope. Bet he's busy with his nose in a book :hehe:

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Bob Hovious (4/6/2009)


    This has certainly been an interesting problem to follow.

    Hey Bob, I'm still messing about with derivatives of this - have you got a "final version" that you'd like to have included? I'd be interested and grateful.

    Cheers

    ChrisM

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • No, I haven't looked at this since I realized that my shift-left version was not the ultimate answer. You guys had already made considerable progress, so I've just been sitting on the bench rooting from the sidelines. However it still keeps tickling the back of my brain. If an elegant solution should suddenly occur to me, I'll post it up here.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Hi

    This is very intresting question, but kind of list in somany replies...could any please tell me the solution/algorithm.

  • Christian Buettner (4/4/2009)


    jsanborn (4/2/2009)


    Since this was posted for a community of database professionals, I didn't think a great deal of detailed explanation was needed just like I didn't think it necessary to explain what a database or a table is.

    That's one thing thats been preying on my mind for a while now:

    Yes, this is a forum for database professionals. But I hope you have realized, that your "problem" actually is not a database problem, nor a relational problem. Ordering rows is far from relational. And the fact that SQL allows you to order rows doesn't make it a database problem. I would rather put it into application development, since it more reflects array sorting.

    The problem you posted is not easy to solve in SQL. You want to have columns sorted individually, but the sorting in each column is non-deterministic because of the nulls. Therefore you can only "compare" the full rows against each other. Unfortunately(really?), ORDER BY in SQL doesn't allow you to compare each two rows indivdually (meaning here: apply different column comparison for each pair of rows, depending on the horizontal position of nulls), therefore you have to somehow "rank" the full rows first (like I did with the COUNT) and then use the rank to do the sorting.

    If I have not overlooked something, we still do not know what your intention was regarding this question or whether it was actually solved. It would be nice if you could shed some light here.

    I agree absolutely that if this is a "real" problem that the original poster has, he/she should solve it with a tool more appropriate to the situation. SQL is very powerful and, as demonstrated by the many alternative ideas presented here, could probably be used to provide results in the order specified. But then, of course, something else would be done for the ordered data to provide any value. Graphs? Some sort of statistical analysis? Who knows? The ordering would probably best be left to an application outside of SQL.

    If this were a forum of bicycling enthusiasts, we may have threads on subjects like derailleur design, tire inflation for different conditions, or whatever. If someone posted a question, "How do I get a load of lumber home?", there may be several solutions proffered -- attach a trailer to your bike, arrange slings on either side of the bike to carry balanced loads, etc. The best answer would actually be "have the lumber yard deliver, or obtain (borrow or rent) a truck" .

  • I agree. I think an application solution would be best. It looks like any possible SQL solution is going to be very cumbersome. Thanks all.

  • jsanborn (4/6/2009)


    I agree. I think an application solution would be best. It looks like any possible SQL solution is going to be very cumbersome. Thanks all.

    As they say here "it depends". The solutions provided by ChrisB and GSquared are elegant and compact, certainly not cumbersome, but are not particularly performant. Tailoring the solution more tightly to actual data would provide considerable scope for performance enhancement but at the expense of simplicity.

    More than one solution has been proposed and proven and one or two additional possibilites have been discussed - if this was a challenge, as some have suggested, then the challenge has been met.

    It's not necessarily the end of the thread when the OP bows out. Members occasionally retain an interest. For those few who do, here are some timings from three or four variants on successful solutions, including those provided by ChrisB and GSquared:

    drop table #test

    create table #test (

    ID int identity, Col1 float, Col2 float, Col3 float, Col4 float, Col5 float, Col6 float, Col7 float, Col8 float);

    --

    insert into #test (Col1, Col2, Col3, Col4, Col5, Col6, Col7, Col8)

    select 2.35,3.01,3.49,4.25,4.79,5.36,5.82,6.31 union all

    select NULL,3.59,4.32,NULL,5.21,NULL,NULL,7.12 union all

    select NULL,NULL,NULL,5.23,5.41,6.33,6.89,7.99 union all

    select 4.12,4.66,5.31,5.92,NULL,6.87,7.34,NULL;

    --

    insert into #test (Col1, Col2, Col3, Col4, Col5, Col6, Col7, Col8)

    select Col1+number, Col2+number, Col3+number, Col4+number, Col5+number, Col6+number, Col7+number, Col8+number

    from dbo.Numbers

    cross join #test

    where number between 10 and 10000;

    --

    CREATE CLUSTERED INDEX [seqm] ON #test

    ([col1] ASC

    )WITH (PAD_INDEX = OFF) ON [PRIMARY]

    GO

    -- GSquared final version =====================

    declare @max-2 float;

    select @max-2 = 1.79E+308;

    ;with CTE (CTEID, Seq1) as

    (select t1.id, count(*)

    from #test t1

    inner join #test t2

    on isnull(t1.col1,@Max) > isnull(t2.col1,0)

    and isnull(t1.col2,@Max) > isnull(t2.col2,0)

    and isnull(t1.col3,@Max) > isnull(t2.col3,0)

    and isnull(t1.col4,@Max) > isnull(t2.col4,0)

    and isnull(t1.col5,@Max) > isnull(t2.col5,0)

    and isnull(t1.col6,@Max) > isnull(t2.col6,0)

    and isnull(t1.col7,@Max) > isnull(t2.col7,0)

    and isnull(t1.col8,@Max) > isnull(t2.col8,0)

    group by t1.id)

    select #test.*

    from #test

    left outer join CTE

    on id = cteid

    order by seq1;

    -- (39968 row(s) affected) / 00:13:19

    -- (3968 row(s) affected) / 00:00:13

    --==============================================

    -- ChrisB

    declare @max-2 float;

    select @max-2 = 1.79E+308;

    SELECT *

    , (SELECT COUNT(*) FROM #test

    WHERE ISNULL(col1,0) <= ISNULL(al.col1,@Max)

    AND ISNULL(col2,0) <= ISNULL(al.col2,@Max)

    AND ISNULL(col3,0) <= ISNULL(al.col3,@Max)

    AND ISNULL(col4,0) <= ISNULL(al.col4,@Max)

    AND ISNULL(col5,0) <= ISNULL(al.col5,@Max)

    AND ISNULL(col6,0) <= ISNULL(al.col6,@Max)

    AND ISNULL(col7,0) <= ISNULL(al.col7,@Max)

    AND ISNULL(col8,0) <= ISNULL(al.col8,@Max)) AS Ordr

    FROM #test al

    ORDER BY Ordr;

    -- (39968 row(s) affected) / 00:11:36

    -- (3968 row(s) affected) / 00:00:13

    --======================================

    -- ChrisM version 1

    SELECT c.*

    FROM #test c

    ORDER BY

    CASE WHEN NOT (c.col1 IS NULL) THEN c.col1 ELSE

    (SELECT MAX(col1) FROM #test WHERE (col1 is NOT null) AND (col2 < c.col2 OR col3 < c.col3 OR col4 < c.col4 OR col5 < c.col5 OR col6 < c.col6 OR col7 < c.col7 OR col8 < c.col8)) END,

    CASE WHEN NOT (c.col1 IS NULL AND c.col2 IS NULL) THEN ISNULL(c.col2, 0) ELSE

    (SELECT MAX(col2) FROM #test WHERE (col2 is NOT null) AND (col1 < c.col1 OR col3 < c.col3 OR col4 < c.col4 OR col5 < c.col5 OR col6 < c.col6 OR col7 < c.col7 OR col8 < c.col8)) END,

    CASE WHEN NOT (c.col1 IS NULL AND c.col2 IS NULL AND c.col3 IS NULL) THEN ISNULL(c.col3, 0) ELSE

    (SELECT MAX(col3) FROM #test WHERE (col3 is NOT null) AND (col1 < c.col1 OR col2 < c.col2 OR col4 < c.col4 OR col5 < c.col5 OR col6 < c.col6 OR col7 < c.col7 OR col8 < c.col8)) END,

    CASE WHEN NOT (c.col1 IS NULL AND c.col2 IS NULL AND c.col3 IS NULL AND c.col4 IS NULL) THEN ISNULL(c.col4, 0) ELSE

    (SELECT MAX(col4) FROM #test WHERE (col4 is NOT null) AND (col1 < c.col1 OR col2 < c.col2 OR col3 < c.col3 OR col5 < c.col5 OR col6 < c.col6 OR col7 < c.col7 OR col8 < c.col8)) END,

    CASE WHEN NOT (c.col1 IS NULL AND c.col2 IS NULL AND c.col3 IS NULL AND c.col4 IS NULL AND c.col5 IS NULL) THEN ISNULL(c.col5, 0) ELSE

    (SELECT MAX(col5) FROM #test WHERE (col5 is NOT null) AND (col1 < c.col1 OR col2 < c.col2 OR col3 < c.col3 OR col4 < c.col4 OR col6 < c.col6 OR col7 < c.col7 OR col8 < c.col8)) END,

    CASE WHEN NOT (c.col1 IS NULL AND c.col2 IS NULL AND c.col3 IS NULL AND c.col4 IS NULL AND c.col5 IS NULL AND c.col6 IS NULL) THEN ISNULL(c.col6, 0) ELSE

    (SELECT MAX(col6) FROM #test WHERE (col6 is NOT null) AND (col1 < c.col1 OR col2 < c.col2 OR col3 < c.col3 OR col4 < c.col4 OR col5 < c.col5 OR col7 < c.col7 OR col8 < c.col8)) END,

    CASE WHEN NOT (c.col1 IS NULL AND c.col2 IS NULL AND c.col3 IS NULL AND c.col4 IS NULL AND c.col5 IS NULL AND c.col6 IS NULL AND c.col7 IS NULL) THEN ISNULL(c.col7, 0) ELSE

    (SELECT MAX(col7) FROM #test WHERE (col7 is NOT null) AND (col1 < c.col1 OR col2 < c.col2 OR col3 < c.col3 OR col4 < c.col4 OR col5 < c.col5 OR col6 < c.col6 OR col8 < c.col8)) END,

    CASE WHEN NOT (c.col1 IS NULL AND c.col2 IS NULL AND c.col3 IS NULL AND c.col4 IS NULL AND c.col5 IS NULL AND c.col6 IS NULL AND c.col7 IS NULL AND c.col8 IS NULL) THEN ISNULL(c.col8, 0) ELSE

    (SELECT MAX(col8) FROM #test WHERE (col8 is NOT null) AND (col1 < c.col1 OR col2 < c.col2 OR col3 < c.col3 OR col4 < c.col4 OR col5 < c.col5 OR col6 < c.col6 OR col7 < c.col7)) END

    -- (39968 row(s) affected) / 00:05:02

    -- (3968 row(s) affected) / 00:00:03

    --======================================

    -- ChrisM version 2

    SELECT Seq, ID, Col1, Col2, Col3, Col4, Col5, Col6, Col7, Col8 FROM (

    SELECT STR(col1,8,2)+STR(0,8,2)+STR(0,8,2)+STR(0,8,2)+STR(0,8,2)+STR(0,8,2)+STR(0,8,2)+STR(0,8,2) AS Seq, *

    FROM #test t

    WHERE col1 IS NOT NULL

    UNION ALL

    SELECT (SELECT STR(MAX(col1),8,2) FROM #test WHERE (col2 <= t.col2 OR col3 <= t.col3 OR col4 <= t.col4 OR col5 <= t.col5 OR col6 <= t.col6 OR col7 <= t.col7 OR col8 <= t.col8)) +

    STR(col2,8,2)+STR(0,8,2)+STR(0,8,2)+STR(0,8,2)+STR(0,8,2)+STR(0,8,2)+STR(0,8,2), *

    FROM #test t

    WHERE col1 IS NULL AND col2 IS NOT null

    UNION ALL

    SELECT (SELECT STR(MAX(col1),8,2) + STR(MAX(col2),8,2) FROM #test WHERE (col3 <= t.col3 OR col4 <= t.col4 OR col5 <= t.col5 OR col6 <= t.col6 OR col7 <= t.col7 OR col8 <= t.col8)) +

    STR(col3,8,2)+STR(0,8,2)+STR(0,8,2)+STR(0,8,2)+STR(0,8,2)+STR(0,8,2), *

    FROM #test t

    WHERE COALESCE(col1, col2) IS NULL AND col3 IS NOT null

    UNION ALL

    SELECT (SELECT STR(MAX(col1),8,2) + STR(MAX(col2),8,2) + STR(MAX(col3),8,2) FROM #test WHERE (col4 <= t.col4 OR col5 <= t.col5 OR col6 <= t.col6 OR col7 <= t.col7 OR col8 <= t.col8)) +

    STR(col4,8,2)+STR(0,8,2)+STR(0,8,2)+STR(0,8,2)+STR(0,8,2), *

    FROM #test t

    WHERE COALESCE(col1, col2, col3) IS NULL AND col4 is NOT NULL

    UNION ALL

    SELECT (SELECT STR(MAX(col1),8,2) + STR(MAX(col2),8,2) + STR(MAX(col3),8,2) + STR(MAX(col4),8,2) FROM #test WHERE (col5 <= t.col5 OR col6 <= t.col6 OR col7 <= t.col7 OR col8 <= t.col8)) +

    STR(col5,8,2)+STR(0,8,2)+STR(0,8,2)+STR(0,8,2), *

    FROM #test t

    WHERE COALESCE(col1, col2, col3, col4) IS NULL AND col5 IS NOT NULL

    UNION ALL

    SELECT (SELECT STR(MAX(col1),8,2) + STR(MAX(col2),8,2) + STR(MAX(col3),8,2) + STR(MAX(col4),8,2) + STR(MAX(col5),8,2) FROM #test WHERE (col6 <= t.col6 OR col7 <= t.col7 OR col8 <= t.col8)) +

    STR(col6,8,2)+STR(0,8,2)+STR(0,8,2), *

    FROM #test t

    WHERE COALESCE(col1, col2, col3, col4, col5) IS NULL AND col6 is NOT null

    UNION ALL

    SELECT (SELECT STR(MAX(col1),8,2) + STR(MAX(col2),8,2) + STR(MAX(col3),8,2) + STR(MAX(col4),8,2) + STR(MAX(col5),8,2) + STR(MAX(col6),8,2) FROM #test WHERE (col7 <= t.col7 OR col8 <= t.col8)) +

    STR(col7,8,2)+STR(0,8,2), *

    FROM #test t

    WHERE COALESCE(col1, col2, col3, col4, col5, col6) IS NULL AND col7 is NOT null -- 00:00:00 (no rows returned)

    UNION ALL

    SELECT (SELECT STR(MAX(col1),8,2) + STR(MAX(col2),8,2) + STR(MAX(col3),8,2) + STR(MAX(col4),8,2) + STR(MAX(col5),8,2) + STR(MAX(col6),8,2) + STR(MAX(col7),8,2) FROM #test WHERE (col8 <= t.col8)) +

    STR(col8,8,2), *

    FROM #test t

    WHERE COALESCE(col1, col2, col3, col4, col5, col6, col7) IS NULL AND col8 is NOT null

    ) d

    ORDER BY Seq

    -- (39968 row(s) affected) / 00:02:26

    -- (3968 row(s) affected) / 00:00:03

    --======================================

    -- ChrisM version 3 ("condensed" version 2)

    SELECT ID, Col1, Col2, Col3, Col4, Col5, Col6, Col7, Col8

    FROM (SELECT *, STR(col1,8,2)+STR(0,8,2)+STR(0,8,2)+STR(0,8,2)+STR(0,8,2)+STR(0,8,2)+STR(0,8,2)+STR(0,8,2) AS Seq

    FROM #test t

    WHERE t.col1 IS NOT NULL

    UNION ALL

    SELECT *, (SELECT STR(MAX(col1),8,2) + STR(MAX(col2),8,2) + STR(MAX(col3),8,2) + STR(MAX(col4),8,2) + STR(MAX(col5),8,2) + STR(MAX(col6),8,2) + STR(MAX(col7),8,2) + STR(MAX(col8),8,2)

    FROM #test

    WHERE (col2 <= t.col2 OR col3 <= t.col3 OR col4 <= t.col4 OR col5 <= t.col5 OR col6 <= t.col6 OR col7 <= t.col7 OR col8 <= t.col8))

    FROM #test t

    WHERE t.col1 IS NULL

    ) d

    ORDER BY Seq

    -- (39968 row(s) affected) / 00:06:11

    -- (3968 row(s) affected) / 00:00:08

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • If you do this:

    update #test

    set col1 = isnull(col1,0),

    col2 = isnull(col2,0),

    col3 = isnull(col3,0),

    col4 = isnull(col4,0),

    col5 = isnull(col5,0),

    col6 = isnull(col6,0),

    col7 = isnull(col7,0),

    col8 = isnull(col8,0)

    ;with CTE (CTEID, Seq1) as

    (select t1.id, count(*)

    from #test t1

    inner join #test t2

    on t1.col1 > t2.col1

    and t1.col2 > t2.col2

    and t1.col3 > t2.col3

    and t1.col4 > t2.col4

    and t1.col5 > t2.col5

    and t1.col6 > t2.col6

    and t1.col7 > t2.col7

    and t1.col8 > t2.col8

    group by t1.id)

    select #test.*

    from #test

    left outer join CTE

    on id = cteid

    order by seq1;

    It takes under 1 second, for the same dataset. That's the real solution.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • GSquared (4/6/2009)


    If you do this:

    update #test

    set col1 = isnull(col1,0),

    col2 = isnull(col2,0),

    col3 = isnull(col3,0),

    col4 = isnull(col4,0),

    col5 = isnull(col5,0),

    col6 = isnull(col6,0),

    col7 = isnull(col7,0),

    col8 = isnull(col8,0)

    ;with CTE (CTEID, Seq1) as

    (select t1.id, count(*)

    from #test t1

    inner join #test t2

    on t1.col1 > t2.col1

    and t1.col2 > t2.col2

    and t1.col3 > t2.col3

    and t1.col4 > t2.col4

    and t1.col5 > t2.col5

    and t1.col6 > t2.col6

    and t1.col7 > t2.col7

    and t1.col8 > t2.col8

    group by t1.id)

    select #test.*

    from #test

    left outer join CTE

    on id = cteid

    order by seq1;

    It takes under 1 second, for the same dataset. That's the real solution.

    Hi GSquared,

    I cannot get your solution to work.

    If I allow 10000 rows (from the pivot table), the query runs forever.

    If I allow only 1000 rows, I get incorrect results.

    I also doubt that the code reflects what needs to be done.

    By setting the NULL columns to 0, you remove their "sorting ambivalence", meaning they could sort before or after specific numbers in the same column, depending on other columns that are not null.

    Maybe I am missing something, if so, I apologize.

    Best Regards,

    Chris Büttner

  • Hi Chris,

    Very fancy stuff that you posted. Wasn't even aware you could do such fancy stuff in an ORDER BY:-).

    Actually I did want to tell you that your queries did not work, but then I realized that mine did not either. Reason: The data in table #test is flawed.

    SELECT A.id, b.id, A.col2, B.col2, a.col3, B.col3 FROM #test A, #test b

    WHERE A.col2 > b.col2 AND a.col3 < B.col3

    No matter how you sort, either col2 or col3 will remain unsorted (could apply to other combinations as well).

    Best Regards,

    Chris Büttner

Viewing 15 posts - 76 through 90 (of 180 total)

You must be logged in to reply to this topic. Login to reply