How does ORDER BY work?

  • Please help me (very experienced SQL Server user) to find an answer to my following question in BOL:

    Why "--F..." is not at the top of returned row set? Please see the image [ORDER BY init.jpg] Some more details on the 2nd image.

    Once again, I want to understand where in BOL I con learn when and what forces SQL Server engine to put "--..." in the middle of values beginning with letters.

    select 'N''a'' > N''-''', case when N'a' > N'-' then 'true' else 'false' end

    union

    select 'N''a'' < N''-''', case when N'a' < N'-' then 'true' else 'false' end

    union

    select 'N''Z'' > N''-''', case when N'a' > N'-' then 'true' else 'false' end

    union

    select 'N''Z'' < N''-''', case when N'a' < N'-' then 'true' else 'false' end

    ----------- -----

    N'a' < N'-' false

    N'a' > N'-' true

    N'Z' < N'-' false

    N'Z' > N'-' true

  • Can't give you a BOL reference on it, because Microsoft doesn't actually define the sort orders for the Unicode character set. That's done by a standards body. So, Microsoft doesn't document it. It might help to read this article: http://msdn.microsoft.com/en-us/library/ms143515.aspx

    Edit: Note that the article I linked is in BOL, if that matters here.

    Try these two tests, you'll see what's going on:

    CREATE TABLE #T

    (Col1 VARCHAR(50) COLLATE SQL_Latin1_General_CP1_CI_AS,

    Col2 VARCHAR(50) COLLATE SQL_Latin1_General_Cp437_BIN,

    Col3 VARCHAR(50) COLLATE SQL_Latin1_General_CP1_CI_AI) ;

    INSERT INTO #T

    (Col1, Col2, Col3)

    VALUES (N'Device Abbr', N'Device Abbr', N'Device Abbr'),

    (N'--FORM--', N'--FORM--', N'--FORM--'),

    (N'Frame72', N'Frame72', N'Frame72') ;

    SELECT *

    FROM #T

    ORDER BY Col1 ;

    SELECT *

    FROM #T

    ORDER BY Col2 ;

    SELECT *

    FROM #T

    ORDER BY Col3 ;

    That one uses VarChar, instead of NVarChar, and sorts the way you are expecting it to, since it follows the ASCII sort order (as per the article I linked).

    The following uses NVarChar, like your table:

    CREATE TABLE #T

    (Col1 NVARCHAR(50) COLLATE SQL_Latin1_General_CP1_CI_AS,

    Col2 NVARCHAR(50) COLLATE SQL_Latin1_General_Cp437_BIN,

    Col3 NVARCHAR(50) COLLATE SQL_Latin1_General_CP1_CI_AI) ;

    INSERT INTO #T

    (Col1, Col2, Col3)

    VALUES (N'Device Abbr', N'Device Abbr', N'Device Abbr'),

    (N'--FORM--', N'--FORM--', N'--FORM--'),

    (N'Frame72', N'Frame72', N'Frame72') ;

    SELECT *

    FROM #T

    ORDER BY Col1 ;

    SELECT *

    FROM #T

    ORDER BY Col2 ;

    SELECT *

    FROM #T

    ORDER BY Col3 ;

    All the sorts except the Binary collation sort the way your data is sorting. That's because Unicode sort orders, except Binary ones, are based on the Unicode code pages, not on what you would expect based on ASCII experience.

    If you take that same test (the Unicode/NVarChar one), and change the first query to:

    SELECT *

    FROM #T

    ORDER BY Col1 COLLATE SQL_Latin1_General_Cp437_BIN ;

    You will get the sort order you want that way.

    So, either switch to ASCII data (if you can), or switch to a binary collation (if you can), or use a binary collation in the Order By clause (if you can't do either of the other two).

    Data on the Unicode standard can be found here: http://www.unicode.org/standard/standard.html

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Here is the reason described:

    http://support.microsoft.com/kb/322112

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Fantastic! How could I not know I have been balancing so close to my maxim "Oh, this is for others, for me Unicode is about the same as ASCII; longer? not a big deal." failure for years!

    Thanks, GSquared!

    Spasibo, Ghenya!

    BTW, by no means am I a rookie. I'd been working with Sybase SQL Server before it became MS in... well, wikipedia knows.

  • You're welcome.

    It's one of those seriously non-intuitive things that can bite anyone, rookie or no.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply