Multi Column Sort

  • Actually, sir, your first response to someone that actually tried to help is the one that I found offensive and put me off even thinking about trying to help you (and that says a lot in and of itself). It was arrogance on your part and as far as I'm concerned, you owe an apology.

    Second, your question was also vague in my opinion. If you are looking for help, the MORE information you provide up front, the better help you will receive. Don't assume that we will all just immediately understand your problem with no explaination.

    The first this you really should do now, is read the first article I have referenced below in my signature block regarding "Asking for assistance".

  • This code generates some sample data; 250,000 rows of six data columns plus an id column which holds the correct order of the data set as described so far by the OP. It's easily extensible, useful when there are still unanswered questions such as datatype of the numbers, number of columns, number of rows - all of which are relevant to the solution.

    DROP TABLE #test

    CREATE TABLE #test (id int, col1 numeric(6,2), col2 numeric(6,2), col3 numeric(6,2), col4 numeric(6,2), col5 numeric(6,2), col6 numeric(6,2), col7 numeric(6,2), col8 numeric(6,2))

    INSERT INTO #test (id, col1, col2, col3, col4, col5, col6, col7, col8)

    SELECT 1, 2.35, 3.01, 3.49, 4.25, 4.79, 5.36, 5.82, 6.31

    UNION SELECT 2, null, 3.59, 4.32, null, 5.21, null, null, 7.12

    UNION SELECT 3, NULL, null, null, 5.23, 5.41, 6.33, 6.89, 7.99

    UNION SELECT 4, null, 5.46, 5.31, 5.92, null, 6.87, 7.34, null

    UNION SELECT 5, 5.12, 5.66, 6.31, 6.92, null, 7.87, 8.34, null

    -- 250,000 rows

    SELECT ROW_NUMBER() OVER (ORDER BY Batch, id) AS [id],

    col1, col2, col3, col4, col5, col6, col7, col8

    FROM (SELECT CAST(1 AS INT) AS [Batch], *

    FROM #test

    UNION ALL

    SELECT 1+(number/4), id, col1+number, col2+number, col3+number, col4+number, col5+number, col6+number, col7+number, col8+number

    FROM #test, Numbers

    WHERE number%4 = 0 AND number <= (200000-4)

    ) d

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • This whole thing is quite obviously just a riddle. I'm just waiting for the next set of examples, which will have something like:

    Col1 Col2

    1 5

    2 4

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • This solution isn't quite as performant as Chris', but it also isn't dependent on the max value being 9, which that one seems to assume.

    ;with CTE (CTEID, Seq1, Seq2) as

    (select t1.id, count(*), case when t2.id is null then 0 else 1 end

    from #T t1

    left outer join #T t2

    on (t1.col1 > t2.col1 or t1.col1 is null or t2.col1 is null)

    and (t1.col2 > t2.col2 or t1.col2 is null or t2.col2 is null)

    and (t1.col3 > t2.col3 or t1.col3 is null or t2.col3 is null)

    and (t1.col4 > t2.col4 or t1.col4 is null or t2.col4 is null)

    and (t1.col5 > t2.col5 or t1.col5 is null or t2.col5 is null)

    and (t1.col6 > t2.col6 or t1.col6 is null or t2.col6 is null)

    and (t1.col7 > t2.col7 or t1.col7 is null or t2.col7 is null)

    and (t1.col8 > t2.col8 or t1.col8 is null or t2.col8 is null)

    group by t1.id, case when t2.id is null then 0 else 1 end)

    select #T.*

    from #T

    inner join CTE

    on id = cteid

    order by seq1, seq2;

    It, of coure, requires three full table scans, instead of the two for Chris', so on bigger sets, it will be slower. But, since this is just an ever-evolving riddle, I'm just happy with an answer that solves all of the given examples so far.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • GSquared, feel free to increase the 9 to int(max), bigint(max) or anything else.

    I just chose 9 because I felt like this 🙂

    Best Regards,

    Chris Büttner

  • Yeah, I realized that about the 9. Was mainly pointing it out because, otherwise, the OP is going to claim your solution can't possibly work, because of it. Was just pre-empting that.

    Here are two versions that I just did test runs on.

    The test data:

    create table #T (

    ID int identity primary key,

    Col1 float,

    Col2 float,

    Col3 float,

    Col4 float,

    Col5 float,

    Col6 float,

    Col7 float,

    Col8 float);

    insert into #T (Col1, Col2, Col3, Col4, Col5, Col6, Col7, Col8)

    select 2.35,3.01,3.49,4.25,4.79,5.36,5.82,6.31 union all

    select NULL,3.59,4.32,NULL,5.21,NULL,NULL,7.12 union all

    select NULL,NULL,NULL,5.23,5.41,6.33,6.89,7.99 union all

    select 4.12,4.66,5.31,5.92,NULL,6.87,7.34,NULL;

    insert into #T (Col1, Col2, Col3, Col4, Col5, Col6, Col7, Col8)

    select Col1+number, Col2+number, Col3+number, Col4+number, Col5+number, Col6+number, Col7+number, Col8+number

    from dbo.Numbers

    cross join #T

    where number between 10 and 1000;

    This produces 3968 rows of data in #T.

    My solution:

    ;with CTE (CTEID, Col1, Col2, Col3, Col4, Col5, Col6, Col7, Col8, Seq1, Seq2) as

    (select t1.id, t1.Col1, t1.Col2, t1.Col3, t1.Col4, t1.Col5, t1.Col6, t1.Col7, t1.Col8,

    count(*), case when t2.id is null then 0 else 1 end

    from #T t1

    left outer join #T t2

    on (t1.col1 > t2.col1 or t1.col1 is null or t2.col1 is null)

    and (t1.col2 > t2.col2 or t1.col2 is null or t2.col2 is null)

    and (t1.col3 > t2.col3 or t1.col3 is null or t2.col3 is null)

    and (t1.col4 > t2.col4 or t1.col4 is null or t2.col4 is null)

    and (t1.col5 > t2.col5 or t1.col5 is null or t2.col5 is null)

    and (t1.col6 > t2.col6 or t1.col6 is null or t2.col6 is null)

    and (t1.col7 > t2.col7 or t1.col7 is null or t2.col7 is null)

    and (t1.col8 > t2.col8 or t1.col8 is null or t2.col8 is null)

    group by t1.id, t1.Col1, t1.Col2, t1.Col3, t1.Col4, t1.Col5, t1.Col6, t1.Col7, t1.Col8,

    case when t2.id is null then 0 else 1 end)

    select CTEID, Col1, Col2, Col3, Col4, Col5, Col6, Col7, Col8

    from CTE

    order by seq1, seq2;

    Chris':

    declare @max-2 float;

    select @max-2 = 1.79E+308;

    SELECT *

    , (SELECT COUNT(*) FROM #T

    WHERE ISNULL(col1,0) <= ISNULL(al.col1,@Max)

    AND ISNULL(col2,0) <= ISNULL(al.col2,@Max)

    AND ISNULL(col3,0) <= ISNULL(al.col3,@Max)

    AND ISNULL(col4,0) <= ISNULL(al.col4,@Max)

    AND ISNULL(col5,0) <= ISNULL(al.col5,@Max)

    AND ISNULL(col6,0) <= ISNULL(al.col6,@Max)

    AND ISNULL(col7,0) <= ISNULL(al.col7,@Max)

    AND ISNULL(col8,0) <= ISNULL(al.col8,@Max)) AS Ordr

    FROM #T al

    ORDER BY Ordr;

    On my desktop machine, my version took 30 seconds to run, and the same on subsequent runs on the same data (caching validation). Chris', modified with the @max-2 value, took 13 seconds on each run. They have remarkably similar execution plans, but that version is definitely superior.

    By visual check, both seem to produce correct results, at least as specced at this point in time.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Another refinement:

    declare @max-2 float;

    select @max-2 = 1.79E+308;

    ;with CTE (CTEID, Col1, Col2, Col3, Col4, Col5, Col6, Col7, Col8, Seq1, Seq2) as

    (select t1.id, t1.Col1, t1.Col2, t1.Col3, t1.Col4, t1.Col5, t1.Col6, t1.Col7, t1.Col8,

    count(*), case when t2.id is null then 0 else 1 end

    from #T t1

    left outer join #T t2

    on isnull(t1.col1,@Max) > isnull(t2.col1,0)

    and isnull(t1.col2,@Max) > isnull(t2.col2,0)

    and isnull(t1.col3,@Max) > isnull(t2.col3,0)

    and isnull(t1.col4,@Max) > isnull(t2.col4,0)

    and isnull(t1.col5,@Max) > isnull(t2.col5,0)

    and isnull(t1.col6,@Max) > isnull(t2.col6,0)

    and isnull(t1.col7,@Max) > isnull(t2.col7,0)

    and isnull(t1.col8,@Max) > isnull(t2.col8,0)

    group by t1.id, t1.Col1, t1.Col2, t1.Col3, t1.Col4, t1.Col5, t1.Col6, t1.Col7, t1.Col8,

    case when t2.id is null then 0 else 1 end)

    select CTEID, Col1, Col2, Col3, Col4, Col5, Col6, Col7, Col8

    from CTE

    order by seq1, seq2;

    24 seconds on the same data as the other two. Chris' is still the champ.

    Another version:

    declare @max-2 float;

    select @max-2 = 1.79E+308;

    ;with CTE (CTEID, Seq1) as

    (select t1.id, count(*)

    from #T t1

    inner join #T t2

    on isnull(t1.col1,@Max) > isnull(t2.col1,0)

    and isnull(t1.col2,@Max) > isnull(t2.col2,0)

    and isnull(t1.col3,@Max) > isnull(t2.col3,0)

    and isnull(t1.col4,@Max) > isnull(t2.col4,0)

    and isnull(t1.col5,@Max) > isnull(t2.col5,0)

    and isnull(t1.col6,@Max) > isnull(t2.col6,0)

    and isnull(t1.col7,@Max) > isnull(t2.col7,0)

    and isnull(t1.col8,@Max) > isnull(t2.col8,0)

    group by t1.id)

    select #T.*

    from #T

    left outer join CTE

    on id = cteid

    order by seq1;

    Same run-time as Chris', 13-14 seconds. It's also darn near the same code. Just an evolution of what I originally wrote.

    Anyone else have any ideas on this?

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • GSquared (4/3/2009)


    Anyone else have any ideas on this?

    There were two other ideas which I've not had time to follow up yet. The first was to derive an average slope for each row, as value over column number, then calculate the missing col1 value from any values on the same row to permit ordering by col1 alone. This is still practical but would require two passes, the first to get an approximate order (or relative "vertical position" of NULL col1 rows), and a second pass using the slope from a nearby row as a baseline to refine the calculated missing value of col1 - this would at least help with skewed data. I've got quite close with this approach.

    The second idea was to stringify row_number of the non-null col1 values then somehow slot in the rest using non-null col2 as a reference series, then col3 etc, obtaining the low col1 row_number and suffixing the stringified row_number of the new test column. This is still mist.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Both "Ping Pong Ball" and "Elephant" have three vowels.

    Both "Ping Pong Ball" and "Cricket" contain the letter "I".

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Bob Hovious (4/3/2009)


    Both "Ping Pong Ball" and "Elephant" have three vowels.

    Both "Ping Pong Ball" and "Cricket" contain the letter "I".

    Elephant and cricket each have one bowel. Dumbo and Lester could both fly - eventually.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • You know, I'm sure that I've seen this guy here before under a different name. And no, not FJ or one of the other regular hacks. No this guy did the same whole "how can I be insulting your intelligence when I'm asking you for help" schtick. That is such a rare and lame equivocational evasion that I suspected at the time that it was a put on.

    Hmm, how can I find that thread again...

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Chris Morris (4/3/2009)


    GSquared (4/3/2009)


    Anyone else have any ideas on this?

    There were two other ideas which I've not had time to follow up yet. The first was to derive an average slope for each row, as value over column number, then calculate the missing col1 value from any values on the same row to permit ordering by col1 alone.......

    Clever, Chris. You couldn't use col1 alone, actually. For collisions on col1, you'd want to use the slope as a secondary order key. Only thing left, perhaps, would be to examine the standard deviation of the domain values as a quantifiable indication of confidence in the resulting row order.

  • jsanborn (4/2/2009)


    Since this was posted for a community of database professionals, I didn't think a great deal of detailed explanation was needed just like I didn't think it necessary to explain what a database or a table is.

    That's one thing thats been preying on my mind for a while now:

    Yes, this is a forum for database professionals. But I hope you have realized, that your "problem" actually is not a database problem, nor a relational problem. Ordering rows is far from relational. And the fact that SQL allows you to order rows doesn't make it a database problem. I would rather put it into application development, since it more reflects array sorting.

    The problem you posted is not easy to solve in SQL. You want to have columns sorted individually, but the sorting in each column is non-deterministic because of the nulls. Therefore you can only "compare" the full rows against each other. Unfortunately(really?), ORDER BY in SQL doesn't allow you to compare each two rows indivdually (meaning here: apply different column comparison for each pair of rows, depending on the horizontal position of nulls), therefore you have to somehow "rank" the full rows first (like I did with the COUNT) and then use the rank to do the sorting.

    If I have not overlooked something, we still do not know what your intention was regarding this question or whether it was actually solved. It would be nice if you could shed some light here.

    Best Regards,

    Chris Büttner

  • GSquared (4/3/2009)


    Anyone else have any ideas on this?

    Nope, as I just wrote in another post, solving this with SQL is probably more or less torture.

    And I think we have already provided too much input, compared to the output from the OP.

    We more or less need a row-comparison operator that works in the intended way (compare col1 if both not null; otherwise compare col2 if both not null; otherwise...).

    But we don't have that one in our SQL toolset.

    We could build one (UDT is in my mind right now), but then we are just better off with sorting within the application. We don't have to force this into SQL.

    Best Regards,

    Chris Büttner

  • jsanborn (4/2/2009)


    using a case statement to include multiple columns simultaneously when zero is encountered will not work.

    Not so. Here's a solution using just that:

    drop table #test

    create table #test (

    ID int identity, Col1 float, Col2 float, Col3 float, Col4 float, Col5 float, Col6 float, Col7 float, Col8 float);

    --

    insert into #test (Col1, Col2, Col3, Col4, Col5, Col6, Col7, Col8)

    select 2.35,3.01,3.49,4.25,4.79,5.36,5.82,6.31 union all

    select NULL,3.59,4.32,NULL,5.21,NULL,NULL,7.12 union all

    select NULL,NULL,NULL,5.23,5.41,6.33,6.89,7.99 union all

    select 4.12,4.66,5.31,5.92,NULL,6.87,7.34,NULL;

    --

    insert into #test (Col1, Col2, Col3, Col4, Col5, Col6, Col7, Col8)

    select Col1+number, Col2+number, Col3+number, Col4+number, Col5+number, Col6+number, Col7+number, Col8+number

    from dbo.Numbers

    cross join #test

    where number between 10 and 10000;

    --

    CREATE CLUSTERED INDEX [seqm] ON #test

    ([col1] ASC

    )WITH (PAD_INDEX = OFF) ON [PRIMARY]

    GO

    --

    SELECT c.*

    FROM #test c

    ORDER BY

    CASE WHEN NOT (c.col1 IS NULL) THEN c.col1 ELSE

    (SELECT MAX(col1) FROM #test WHERE (col1 is NOT null) AND (col2 < c.col2 OR col3 < c.col3 OR col4 < c.col4 OR col5 < c.col5 OR col6 < c.col6 OR col7 < c.col7 OR col8 < c.col8)) END,

    CASE WHEN NOT (c.col1 IS NULL AND c.col2 IS NULL) THEN ISNULL(c.col2, 0) ELSE

    (SELECT MAX(col2) FROM #test WHERE (col2 is NOT null) AND (col1 < c.col1 OR col3 < c.col3 OR col4 < c.col4 OR col5 < c.col5 OR col6 < c.col6 OR col7 < c.col7 OR col8 < c.col8)) END,

    CASE WHEN NOT (c.col1 IS NULL AND c.col2 IS NULL AND c.col3 IS NULL) THEN ISNULL(c.col3, 0) ELSE

    (SELECT MAX(col3) FROM #test WHERE (col3 is NOT null) AND (col1 < c.col1 OR col2 < c.col2 OR col4 < c.col4 OR col5 < c.col5 OR col6 < c.col6 OR col7 < c.col7 OR col8 < c.col8)) END,

    CASE WHEN NOT (c.col1 IS NULL AND c.col2 IS NULL AND c.col3 IS NULL AND c.col4 IS NULL) THEN ISNULL(c.col4, 0) ELSE

    (SELECT MAX(col4) FROM #test WHERE (col4 is NOT null) AND (col1 < c.col1 OR col2 < c.col2 OR col3 < c.col3 OR col5 < c.col5 OR col6 < c.col6 OR col7 < c.col7 OR col8 < c.col8)) END,

    CASE WHEN NOT (c.col1 IS NULL AND c.col2 IS NULL AND c.col3 IS NULL AND c.col4 IS NULL AND c.col5 IS NULL) THEN ISNULL(c.col5, 0) ELSE

    (SELECT MAX(col5) FROM #test WHERE (col5 is NOT null) AND (col1 < c.col1 OR col2 < c.col2 OR col3 < c.col3 OR col4 < c.col4 OR col6 < c.col6 OR col7 < c.col7 OR col8 < c.col8)) END,

    CASE WHEN NOT (c.col1 IS NULL AND c.col2 IS NULL AND c.col3 IS NULL AND c.col4 IS NULL AND c.col5 IS NULL AND c.col6 IS NULL) THEN ISNULL(c.col6, 0) ELSE

    (SELECT MAX(col6) FROM #test WHERE (col6 is NOT null) AND (col1 < c.col1 OR col2 < c.col2 OR col3 < c.col3 OR col4 < c.col4 OR col5 < c.col5 OR col7 < c.col7 OR col8 < c.col8)) END,

    CASE WHEN NOT (c.col1 IS NULL AND c.col2 IS NULL AND c.col3 IS NULL AND c.col4 IS NULL AND c.col5 IS NULL AND c.col6 IS NULL AND c.col7 IS NULL) THEN ISNULL(c.col7, 0) ELSE

    (SELECT MAX(col7) FROM #test WHERE (col7 is NOT null) AND (col1 < c.col1 OR col2 < c.col2 OR col3 < c.col3 OR col4 < c.col4 OR col5 < c.col5 OR col6 < c.col6 OR col8 < c.col8)) END,

    CASE WHEN NOT (c.col1 IS NULL AND c.col2 IS NULL AND c.col3 IS NULL AND c.col4 IS NULL AND c.col5 IS NULL AND c.col6 IS NULL AND c.col7 IS NULL AND c.col8 IS NULL) THEN ISNULL(c.col8, 0) ELSE

    (SELECT MAX(col8) FROM #test WHERE (col8 is NOT null) AND (col1 < c.col1 OR col2 < c.col2 OR col3 < c.col3 OR col4 < c.col4 OR col5 < c.col5 OR col6 < c.col6 OR col7 < c.col7)) END

    -- (39968 row(s) affected) / 00:05:07 (00:06:30 without clustered index)

    -- (3968 row(s) affected) / 00:00:03 The query returns results from GSquared's test data in about 3 seconds on this laptop. Here's a book recommendation for jsanborn:-P

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 15 posts - 61 through 75 (of 180 total)

You must be logged in to reply to this topic. Login to reply