Difference between ROW_NUMBER() and ORDER BY...DESC

  • Hi guys

    This is just for interest sake from my side. I have the following code entered into Adventureworks2014:

    SELECT p.FirstName

    ,p.LastName

    ,ROW_NUMBER() OVER(ORDER BY s.SalesYTD DESC) AS 'RowNumber'

    ,s.SalesYTD

    ,a.PostalCode

    FROM [Sales].[SalesPerson] s

    INNER JOIN [Person].[Person] p

    ON s.BusinessEntityID = p.BusinessEntityID

    INNER JOIN [Person].[BusinessEntityAddress] ba

    ON p.BusinessEntityID = ba.BusinessEntityID

    INNER JOIN [Person].[Address] a

    ON a.AddressID = ba.AddressID

    WHERE s.TerritoryID IS NOT NULL

    --ORDER BY s.SalesYTD DESC

    AND get the same output if I uncomment the ORDER BY s.SalesYTD DESC line and comment out ROW_NUMBER() OVER(ORDER BY s.SalesYTD DESC) AS 'RowNumber'. I hardly ever use the ROW_NUMBER() function, so is there a difference using this function and ORDER BY...DESC?

    Like I said, this is not something I am working with, but would just like to learn more about the rank functions, as I never get to use it in my working environment.

  • Hi frdrckmitchell7,

    row_number() is not actually meant to sort the result set. In this case you got lucky and you get the same result.

    But try this:

    SELECT p.FirstName

    ,p.LastName

    ,ROW_NUMBER() OVER(ORDER BY s.SalesYTD DESC) AS 'RowNumber'

    ,s.SalesYTD

    ,a.PostalCode

    FROM [Sales].[SalesPerson] s

    INNER JOIN [Person].[Person] p

    ON s.BusinessEntityID = p.BusinessEntityID

    INNER JOIN [Person].[BusinessEntityAddress] ba

    ON p.BusinessEntityID = ba.BusinessEntityID

    INNER JOIN [Person].[Address] a

    ON a.AddressID = ba.AddressID

    WHERE s.TerritoryID IS NOT NULL

    ORDER BY a.PostalCode DESC

    you'll see that the result set will be the same, but the order will be different.

    If you use the ROW_NUMBER() function, usually, you want to define a partition as well, so, let's try this query:

    select Row_Number() over(partition by City order by AddressId asc), *

    from [Person].[Address]

    What am I asking to sql server? I'm asking it to get all the rows in the table person.address, have a look at the result and give a number to the rows according to the city: if all the city in the table are different, all the rows will have number 1. If SQL Server find that two rows have the same city, it will number them according to AddressId: the row with the lower addressid will have the number 1, the one with addressid a little higher will have number 2 and so on.

    Have a nice day! 🙂

  • Thanks!

    Go figure - trying to learn something and you get lucky (with the results that is)

  • Good point to remember is that without a final ORDER BY clause, the optimizer may present rows to you in a manner you didn't anticipate. It just goes with what is the least work.

    When you used the Row_Number() function it had to sort the rows the produce the result, so it just gave you the output after the sort. Try doing TWO Row_Number() functions on different columns. Do the first one on your primary key and the second on some random column.

    Some of the reasons behind the requirement of an ORDER BY to control output sequence are:

    The optimizer may choose to a non-clustered index which is sorted in an order different than that of the clustered index. It does this because reducing i/o on the front end is often most efficient.

    The optimizer may start it's query against rows already in memory. (Try loading a large table sometime and then immediately ask for the TOP(10) rows without an ORDER BY.)

    The optimizer may create a parallel plan.

    But no matter what happens, if the rows are in the wrong sequence before they display, the optimizer will add a sort near the end of the execution plan. (Or near the beginning if you read plans from left to right. 😉

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Thanks for the advice. Really appreciate it.

  • That said - in the case you're showing, whether or not the actual output is ordered the same or not, the ROW_NUMER() is essentially ranking your sales folks by sales year to date.#1 would be the person with the highest sales, incrementing down from there. Even if you order the list by last name - that row number would still show how they fared compared to the rest of the sales team.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Yeah, I saw that, thanks.

  • Everyone has touched on this but the most important thing to understand is: the only way to guarantee that results are returned in a specific order is with an ORDER BY statement. ORDER BY in an OVER clause does not count. Although ROW_NUMBER() OVER (ORDER BY x) and an ORDER BY x (outside of the SELECT) will sort the rows by x, only the second ORDER BY is guaranteed to sort the rows by x. Take these two queries for example:

    -- 5 rows of sample data

    IF OBJECT_ID('tempdb..#x') IS NOT NULL DROP TABLE #x;

    SELECT TOP (5) col1 = CAST(newid() as varchar(40)) INTO #x FROM sys.all_columns;

    SELECT col1, ROW_NUMBER() OVER (ORDER BY col1) FROM #x ORDER BY col1;

    SELECT col1 FROM #x ORDER BY col1;

    In this case both queries return the rows sorted by col1. Microsoft SQL Server only guarantees that the second one will be sorted by col1. If the second query did not return the rows ordered by col1 you could report this to Microsoft as a defect; this is not the case with the first query.

    These queries, too, will sort the result set by col1 but this is not absolutely, positively guaranteed.

    SELECT DISTINCT col1 FROM #x;

    SELECT col1, COUNT(col1) FROM #x GROUP BY col1;

    SELECT col1, nt = NTILE(3) OVER (ORDER BY col1) FROM #x;

    SELECT col1, xx = MAX(col1) OVER (PARTITION BY col1) FROM #x;

    Again, the only way to guarantee that results are returned in a specific order is with an ORDER BY statement.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply