Pivot & Order By

  • I've this pivot query where I want to apply an order by clause. The order by works fine if the pivot query is not an inner query but the moment I make it an inner query the order by doesn't take effect. Here is the query:

    drop table #table1

    drop table #table2

    CREATE TABLE #table1 (id int, nm varchar(50))

    insert into #table1

    select 1, 'Type 1'

    union all

    select 2, 'Type 2'

    create table #table2 (myno varchar(25), mydate datetime, mystatus varchar(1), mytype int)

    insert into #table2

    select 34001, getdate(),'W',1

    union all

    select 34001, getdate(),'Y',2

    union all

    select 34002, getdate(),'G',1

    union all

    select 34003, getdate(),'G',1

    union all

    select 34004, getdate(),'W',1

    select * from #table2

    select * from(

    select top 100 percent row_number() over (order by myno) [rownum], mydate, myno, [Type 1], [Type 2]

    from

    (

    select mydate, myno, myname,mystatus

    from (select mydate, myno, t1.nm [myname],mystatus

    from #table2 t2

    inner join #table1 t1 on t1.id = t2.mytype ) as a

    ) as source

    pivot (max(mystatus) for [myname] in ([Type 1], [Type 2])) as pvt

    order by [Type 1]

    ) as z where z.rownum between 0 and 10

    The reason for putting the pivot in inner query is to apply the where clause. It doesn't work if I make it part of the pivot query.

    Any ideas? Thanks.

  • Thank you for providing the test data.... I may be missing something about what you want to do, but it seems that just moving the "order by" to the end of the query may work. Is this what you're trying for?

    Results of original query:rownum mydate myno Type 1 Type 2

    -------------------- ----------------------- ------------------------- ------ ------

    1 2009-05-28 18:06:25.123 34001 W Y

    2 2009-05-28 18:06:25.123 34002 G NULL

    3 2009-05-28 18:06:25.123 34003 G NULL

    4 2009-05-28 18:06:25.123 34004 W NULL

    After moving the "order by".....pivot (max(mystatus) for [myname] in ([Type 1], [Type 2])) as pvt

    --order by [Type 1]

    ) as z where z.rownum between 0 and 10

    order by [Type 1]

    .... the results are in order by [Type 1]:rownum mydate myno Type 1 Type 2

    -------------------- ----------------------- ------------------------- ------ ------

    2 2009-05-28 18:09:04.063 34002 G NULL

    3 2009-05-28 18:09:04.063 34003 G NULL

    4 2009-05-28 18:09:04.063 34004 W NULL

    1 2009-05-28 18:09:04.063 34001 W Y

  • john.arnott (5/28/2009)


    I may be missing something about what you want to do, but it seems that just moving the "order by" to the end of the query may work.

    I think I had tried various of those options before posting here and I am sure SQL Studio was playing games with me when it didn't return the results in the order I wanted!! :hehe:

    Bottomline is your suggestion works (and that's exactly what I was looking for). Thank you very much for looking through the obvious.

    eDIT: Doesn't this forum have any kind of $rep system or mark a particular post as answer?

  • Doesn't this forum have any kind of $rep system or mark a particular post as answer?

    I've not seen anything like that. Usually, as in this case, the original poster will confirm that they've got things worked out. And actually, often different people will have offered multiple suggestions and we end up with a dialog that incrementally improves the "solution", reaching (or not) an "Answer" by consensus. Sometimes, someone will ask about a specific, narrow piece of what they've done, and the community will respond with ideas about the overall situation.

    For example, you shouldn't be surprised if someone were to see this thread and suggest that an approach other than using the SQL Server "PIVOT" operator may be appropriate in many cases. Oh, heck, why don't I be that person? You'll want to take a look at this two-part article on the subject:http://qa.sqlservercentral.com/articles/T-SQL/63681/

  • john.arnott (5/29/2009)


    Usually, as in this case, the original poster will confirm that they've got things worked out. And actually, often different people will have offered multiple suggestions and we end up with a dialog that incrementally improves the "solution", reaching (or not) an "Answer" by consensus.

    Well I agree with the "Answer"-ed part and I too don't specifically look for the exact solution but the leads that thread can provide.

    For example, you shouldn't be surprised if someone were to see this thread and suggest that an approach other than using the SQL Server "PIVOT" operator may be appropriate in many cases. Oh, heck, why don't I be that person? You'll want to take a look at this two-part article on the subject:http://qa.sqlservercentral.com/articles/T-SQL/63681/

    Thank you for pointing that out, and a surprising find on Pivot performance. But the only thing I'm doing in this query (as you must have observed) is just Pivoting the values in source table, no aggregation of any sort (except for the MAX function). The source table data will already be filtered out before PIVOT function and I assume the performance should not deteriorate.

    But in case there are surprise findings during the stress testing phase, I know where I should be profiling. 🙂

  • john.arnott (5/28/2009)


    I may be missing something about what you want to do, but it seems that just moving the "order by" to the end of the query may work.

    After going through the rigors of entire day when it was turn of this query to be tested I realized why I posted it. Happy that I didn't overlook such simple permutation and frustrated that I wasted my day and didn't realize it (sounds so idiotic :crying:)

    So here is the real issue. If we elaborate on the test data for table2select 34001, getdate(),'W',1

    union all

    select 34002, getdate(),'W',1

    union all

    select 34003, getdate(),'W',1

    union all

    select 34004, getdate(),'W',1

    union all

    select 34005, getdate(),'W',1

    union all

    select 34006, getdate(),'W',1

    union all

    select 34004, getdate(),'W',1

    union all

    select 34007, getdate(),'W',1

    union all

    select 34008, getdate(),'W',1

    union all

    select 34009, getdate(),'W',1

    union all

    select 34010, getdate(),'Y',2

    union all

    select 34010, getdate(),'G',1

    union all

    select 34011, getdate(),'G',1

    union all

    select 34012, getdate(),'W',1and run the same query for rownum between 0 & 5; the expected result is that all G's should come before W's. But that's not what happens. The solution we have at hand first applies the where condition and then orders the result.

    What I want is the where condition on the ordered result?

  • Which five rows would you want to be returned? Mocked-up desired results should clarify what you're trying to get here.

    You've given them an order-by for the row_number() function that's based solely on "myno", then taken the first five. When I run this after inserting all the rows to #table2, both from the original post and the addendum, the results are all for the first five values of "myno".

    Do you just want all the type Gs first even while limiting the output to just the first 5 or 10? For that, you could change the row_number() call to use "Type 1" as its primary sort key:select top 100 percent row_number() over (order by [Type 1], myno) [rownum], mydate, myno, [Type 1], [Type 2]

    By the way, I'm guessing that the "top 100 percent" is in there just to control test volumes.

  • Thank you for your patience and responses, John.

    You guessed right that I expect to see G's first. And if I put Type 1 in the row_number() order by clause then I get what I am looking for (and it works well for other pivot columns as well if I replace the column name there).

    But this has me a bit confused on how the order by works in Row_Number() and where am I fundamentally wrong.

    My assumption was that a result set will be created with row numbers assigned by column "myno". And then I am ordering the result set by another column ([Type 1] here) value. And the thing is it works as expected if I do not have the "where rownum between.." clause. The code below gives me G's as the top rows. select * from(

    select row_number() over (order by myno) [rownum], mydate, myno, [Type 1], [Type 2]

    from

    (

    select mydate, myno, myname,mystatus

    from (select mydate, myno, t1.nm [myname],mystatus

    from #table2 t2

    inner join #table1 t1 on t1.id = t2.mytype ) as a

    ) as source

    pivot (max(mystatus) for [myname] in ([Type 1], [Type 2])) as pvt

    ) as z

    --where z.rownum between 0 and 10

    order by [Type 1]My initial approach was to get this ordered list as a derived table and then apply the where clause on top of it.select * from (select row_num() over ............ as z order by [Type 1]) as zz where rownum between 1 and 5When I run the query in this structure it doesn't return me G's as the top row.

    And I am unable to understand why would that be!

    By the way, I'm guessing that the "top 100 percent" is in there just to control test volumes.

    That's a carry over from my initial query when I was putting the order by clause as part of an inline query (like the 2nd scenario I've mentioned).

    I hope I am able to make myself clear.

    Thanks.

  • My assumption was that a result set will be created with row numbers assigned by column "myno". And then I am ordering the result set by another column ([Type 1] here) value. And the thing is it works as expected if I do not have the "where rownum between.." clause. The code below gives me G's as the top rows.

    The "rownum" column is indeed determined by the ROWNUMBER() function and its values are simply incremented in the order you specify, which is only by "myno". Combining the input rows that you created in the two posts gives us a #table2 with 19 rows in it. Because of duplicate "myno" values, the PIVOT gives just 12 rows, one for each unique "myno", always giving precedence to the "W" rows in #table2 where two rows have the same "myno" because "W" is greater than "G" and is therefore the max(mystatus). Your test data had all W status for each of the first ten myno values (or had a G that was then superseded by a W), so the BETWEEN 0 and 10 kept the query from seeing any of the later rows that only had a "G" status.

    Including the status in the definition of rownum ensures that the "G" rows later in #table2 that were not overridden by "W" rows will still be considered for inclusion. You may still want to consider whether you will indeed have duplicate "myno" values and if so, whether showing only the "W" version satisfies your ultimate requirements.

  • Thanks for the explanation. I get it now.

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply