Difference between ROW_NUMBER() and ORDER BY...DESC

Question

Post reply

Difference between ROW_NUMBER() and ORDER BY...DESC

frdrckmitchell7

Ten Centuries

Points: 1351
More actions
August 31, 2016 at 2:18 am

#312943

Hi guys

This is just for interest sake from my side. I have the following code entered into Adventureworks2014:

SELECT p.FirstName

,p.LastName

,ROW_NUMBER() OVER(ORDER BY s.SalesYTD DESC) AS 'RowNumber'

,s.SalesYTD

,a.PostalCode

FROM [Sales].[SalesPerson] s

INNER JOIN [Person].[Person] p

ON s.BusinessEntityID = p.BusinessEntityID

INNER JOIN [Person].[BusinessEntityAddress] ba

ON p.BusinessEntityID = ba.BusinessEntityID

INNER JOIN [Person].[Address] a

ON a.AddressID = ba.AddressID

WHERE s.TerritoryID IS NOT NULL

--ORDER BY s.SalesYTD DESC

AND get the same output if I uncomment the ORDER BY s.SalesYTD DESC line and comment out ROW_NUMBER() OVER(ORDER BY s.SalesYTD DESC) AS 'RowNumber'. I hardly ever use the ROW_NUMBER() function, so is there a difference using this function and ORDER BY...DESC?

Like I said, this is not something I am working with, but would just like to learn more about the rank functions, as I never get to use it in my working environment.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

dario.quinto SSC Journeyman Points: 90 More actions · Answer 1

Hi frdrckmitchell7,

row_number() is not actually meant to sort the result set. In this case you got lucky and you get the same result.

But try this:

SELECT p.FirstName

,p.LastName

,ROW_NUMBER() OVER(ORDER BY s.SalesYTD DESC) AS 'RowNumber'

,s.SalesYTD

,a.PostalCode

FROM [Sales].[SalesPerson] s

INNER JOIN [Person].[Person] p

ON s.BusinessEntityID = p.BusinessEntityID

INNER JOIN [Person].[BusinessEntityAddress] ba

ON p.BusinessEntityID = ba.BusinessEntityID

INNER JOIN [Person].[Address] a

ON a.AddressID = ba.AddressID

WHERE s.TerritoryID IS NOT NULL

ORDER BY a.PostalCode DESC

you'll see that the result set will be the same, but the order will be different.

If you use the ROW_NUMBER() function, usually, you want to define a partition as well, so, let's try this query:

select Row_Number() over(partition by City order by AddressId asc), *

from [Person].[Address]

What am I asking to sql server? I'm asking it to get all the rows in the table person.address, have a look at the result and give a number to the rows according to the city: if all the city in the table are different, all the rows will have number 1. If SQL Server find that two rows have the same city, it will number them according to AddressId: the row with the lower addressid will have the number 1, the one with addressid a little higher will have number 2 and so on.

Have a nice day! 🙂

frdrckmitchell7 Ten Centuries Points: 1351 More actions · Answer 2

frdrckmitchell7

Ten Centuries

Points: 1351

August 31, 2016 at 3:14 am

#1898480

Thanks!

Go figure - trying to learn something and you get lucky (with the results that is)

The Dixie Flatline SSC Guru Points: 53353 More actions · Answer 3

Good point to remember is that without a final ORDER BY clause, the optimizer may present rows to you in a manner you didn't anticipate. It just goes with what is the least work.

When you used the Row_Number() function it had to sort the rows the produce the result, so it just gave you the output after the sort. Try doing TWO Row_Number() functions on different columns. Do the first one on your primary key and the second on some random column.

Some of the reasons behind the requirement of an ORDER BY to control output sequence are:

The optimizer may choose to a non-clustered index which is sorted in an order different than that of the clustered index. It does this because reducing i/o on the front end is often most efficient.

The optimizer may start it's query against rows already in memory. (Try loading a large table sometime and then immediately ask for the TOP(10) rows without an ORDER BY.)

The optimizer may create a parallel plan.

But no matter what happens, if the rows are in the wrong sequence before they display, the optimizer will add a sort near the end of the execution plan. (Or near the beginning if you read plans from left to right. 😉

__________________________________________________

Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

frdrckmitchell7 Ten Centuries Points: 1351 More actions · Answer 4

frdrckmitchell7

Ten Centuries

Points: 1351

August 31, 2016 at 7:39 am

#1898545

Thanks for the advice. Really appreciate it.

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 5

That said - in the case you're showing, whether or not the actual output is ordered the same or not, the ROW_NUMER() is essentially ranking your sales folks by sales year to date.#1 would be the person with the highest sales, incrementing down from there. Even if you order the list by last name - that row number would still show how they fared compared to the rest of the sales team.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

frdrckmitchell7 Ten Centuries Points: 1351 More actions · Answer 6

frdrckmitchell7

Ten Centuries

Points: 1351

August 31, 2016 at 8:09 am

#1898556

Yeah, I saw that, thanks.

Alan Burstein SSC Guru Points: 61136 More actions · Answer 7

Everyone has touched on this but the most important thing to understand is: the only way to guarantee that results are returned in a specific order is with an ORDER BY statement. ORDER BY in an OVER clause does not count. Although ROW_NUMBER() OVER (ORDER BY x) and an ORDER BY x (outside of the SELECT) will sort the rows by x, only the second ORDER BY is guaranteed to sort the rows by x. Take these two queries for example:

-- 5 rows of sample data



IF OBJECT_ID('tempdb..#x') IS NOT NULL DROP TABLE #x;

SELECT TOP (5) col1 = CAST(newid() as varchar(40)) INTO #x FROM sys.all_columns;

SELECT col1, ROW_NUMBER() OVER (ORDER BY col1) FROM #x ORDER BY col1;

SELECT col1 FROM #x ORDER BY col1;

In this case both queries return the rows sorted by col1. Microsoft SQL Server only guarantees that the second one will be sorted by col1. If the second query did not return the rows ordered by col1 you could report this to Microsoft as a defect; this is not the case with the first query.

These queries, too, will sort the result set by col1 but this is not absolutely, positively guaranteed.

SELECT DISTINCT col1 FROM #x;



SELECT col1, COUNT(col1) FROM #x GROUP BY col1;

SELECT col1, nt = NTILE(3) OVER (ORDER BY col1) FROM #x;

SELECT col1, xx = MAX(col1) OVER (PARTITION BY col1) FROM #x;

Again, the only way to guarantee that results are returned in a specific order is with an ORDER BY statement.

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001