OUTPUT clause returning the wrong row?

Question

Post reply

OUTPUT clause returning the wrong row?

g.britton

SSChampion

Points: 13862
More actions
November 6, 2014 at 7:44 am

#298210

I'm looking at various methods for deleting duplicate rows. Among the alternatives, one works just fine but gives me results that make me go, "huh?".

Consider this script:

declare @t table (a int, b int, c int, d int, e int) insert into @t (a, b, c, d, e) values (1, 2, 3, 4, 5), (3, 4, 2, 3, 4), (1, 2, 3, 4, 5) select a,b,c,d,e, rn = row_number() over ( partition by a,b,c,d,e order by a,b,c,d,e) from @t; with cte as ( select a,b,c,d,e, rn = row_number() over ( partition by a,b,c,d,e order by a,b,c,d,e) from @t ) delete from cte output deleted.* where rn > 1;

The code works -- that is, the duplicate row is deleted. However the output clause returns:

a b c d e rn

1 2 3 4 5 1

So....why? Why does the output clause show that the row with rn=1 was deleted, when the where clause stipulates rn > 1?

Gerald Britton, Pluralsight courses

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

Phil Parkin SSC Guru Points: 245813 More actions · Answer 1

I presume this is just an 'out of interest' type question, because when you're deleting exact duplicates from a heap, there is no real row number? You could not use this information to relate back to the original data and match the duplicate rows one to one.

If you add another dupe row, you'll see that the deleted row numbers are 1 and 2. So it seems that the SQL engine removes the correct rows and then somehow loses track of the row which remains. Looks a bit strange, but not really a problem.

If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

g.britton SSChampion Points: 13862 More actions · Answer 2

Phil Parkin (11/6/2014)

I presume this is just an 'out of interest' type question, because when you're deleting exact duplicates from a heap, there is no real row number?

The row number is generated by the ROW_NUMBER() function and the sort order is specified. AFAIK there is no "real" row number in a clustered index either. The closest you get is the CI key which maps to RID which is just a file/page/rowid.

Even if I change the table definition to:



declare @t table (id int identity(1,1) primary key, a int, b int, c int, d int, e int)

to make it a CI, the output clause still returns incorrect results.

Gerald Britton, Pluralsight courses

Phil Parkin SSC Guru Points: 245813 More actions · Answer 3

g.britton (11/6/2014)

Phil Parkin (11/6/2014)

I presume this is just an 'out of interest' type question, because when you're deleting exact duplicates from a heap, there is no real row number?

The row number is generated by the ROW_NUMBER() function and the sort order is specified. AFAIK there is no "real" row number in a clustered index either. The closest you get is the CI key which maps to RID which is just a file/page/rowid.

Even if I change the table definition to:

declare @t table (id int identity(1,1) primary key, a int, b int, c int, d int, e int)

to make it a CI, the output clause still returns incorrect results.

Well, using your revised table definition, the delete should be done like this:



delete t

output deleted.*

from @t t

join cte on t.Id = cte.id

where rn > 1;

And everything works.

If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

ScottPletcher SSC Guru Points: 100262 More actions · Answer 4

My guess is it's just some quirk with the output clause and ROW_NUMBER(), because the expected row gets kept even when the OUTPUT seems to indicate otherwise:



declare @t table (a int, b int, c int, d int, e int, f int)

insert into @t (a, b, c, d, e, f) values

	(1, 2, 3, 4, 5, 99),

	(3, 4, 2, 3, 4, 98),

	(1, 2, 3, 4, 5, 96),

	(1, 2, 3, 4, 5, 97)

select a,b,c,d,e, rn = row_number() over (

					partition by a,b,c,d,e

					order by a,b,c,d,e)

from @t;

with cte as (

	select a,b,c,d,e, rn = row_number() over (

							partition by a,b,c,d,e

							order by a,b,c,d,e,f)

		from @t

)

delete

from cte

output deleted.*

where rn > 1;

select * from @t

SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

g.britton SSChampion Points: 13862 More actions · Answer 5

g.britton

SSChampion

Points: 13862

November 6, 2014 at 9:53 am

#1757051

well, isn't that interesting!

Don't know about the "should be done" bit though. My version doesn't need a self-join. It's just weird that SQL loses track of what it just deleted.

Gerald Britton, Pluralsight courses

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 6

g.britton (11/6/2014)

well, isn't that interesting!

Don't know about the "should be done" bit though. My version doesn't need a self-join. It's just weird that SQL loses track of what it just deleted.

You're missing something.

The issue has to do with using the row_number() against the DELETED "magic table". If you were to add in the ID number like you mentioned and control which one to delete you will see it's wiping out the expected row. That said the ROW_NUMBER value isn't a physical value in the table, so DELETED doesn't have it to return. As a result the query attempt to provide based on what it knows from the CTE, but only find one row (in the delete), ergo RN is 1.

Confirm the results this way:



declare @t table (a int, b int, c int, d int, e int, id int identity(1,1))--<-- adding in the identity

insert into @t (a, b, c, d, e) values

	(1, 2, 3, 4, 5),

	(3, 4, 2, 3, 4),

	(1, 2, 3, 4, 5)

select a,b,c,d,e,ID, row_number() over (

					partition by a,b,c,d,e

					order by a,b,c,d,e, ID) as RN  --<-- making SURE that the "second" duplicate gets deleted

from @t;

with cte as (

	select a,b,c,d,e,ID, row_number() over (

							partition by a,b,c,d,e

							order by a,b,c,d,e, ID) as RN

		from @t

)

delete

from cte

output deleted.*

where rn > 1;

You will see that ID = 3 is the one that is gone. The ID is reliable since it's a physical column in the table which is what exists in the DELETED pseudo table

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Eirikur Eiriksson SSC Guru Points: 182631 More actions · Answer 7

Quick thought, there are actually two row_number instances in the plan, one used for the delete and another for the output. As only one row is outputted, the row number is of course 1.

😎

[Table Scan]-->[Sort]--[Segment]-->[Sequence Project]-->[Filter]-->[Table Delete]-->[Sort]-->[Segment]-->[Sequence Project]-->[DELETE]

|<---------- 1st row number -------->| delete |<--------- 2nd row number ---------->| output

g.britton SSChampion Points: 13862 More actions · Answer 8

g.britton

SSChampion

Points: 13862

November 6, 2014 at 10:53 am

#1757059

Good observation, though the output becomes misleading. I never doubted that the "right" row was deleted. I just found the results from the output clause to be confusing. I've lowered that to "misleading" 🙂

Gerald Britton, Pluralsight courses

Gazareth One Orange Chip Points: 27737 More actions · Answer 9

Gazareth

One Orange Chip

Points: 27737

November 6, 2014 at 11:19 am

#1757065

I actually encountered this exact behaviour last weekend. Got me a bit worried!

My explanation would be that the CTE is never materialised as such, so the rn value is never stored, and the ROW_NUMBER() clause is re-evaluated in the deleted table.

Eirikur Eiriksson SSC Guru Points: 182631 More actions · Answer 10

Gazareth (11/6/2014)

I actually encountered this exact behaviour last weekend. Got me a bit worried!

My explanation would be that the CTE is never materialised as such, so the rn value is never stored, and the ROW_NUMBER() clause is re-evaluated in the deleted table.

That's in a way correct, there are two row number (segment-sequence project) operators in the execution plan, one for the delete and one for the output.

😎