Increment query based off two tables

Question

Post reply

Increment query based off two tables

John215

SSC Eights!

Points: 871
More actions
February 17, 2014 at 9:33 am

#285792

Hello!

I have two tables with this info:

TABLE 1

COL1 COL2 COL3

AAA BBB CCC

QQQ WWW EEE

AAA SSS DDD

WWW EEE RRR

BBB BBB BBB

TABLE 2

COL1 COL2 COL3 COL4

b b b 343

a a a 344

c c c 345

d d d 346

e e e 347

I want to insert TABLE 1 into TABLE 2 with a query that will auto increment to COL4 looking like this:

COL1 COL2 COL3 COL4

b b b 343

a a a 344

c c c 345

d d d 346

e e e 347

AAA BBB CCC 348

QQQ WWW EEE 349

AAA SSS DDD 350

WWW EEE RRR 351

BBB BBB BBB 352

I know this can be done easily by just altering the column to have an auto-increment datatype, but I currently cannot do that at this moment. Any help on this would be awesome.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

Russel Loski SSCertifiable Points: 7054 More actions · Answer 1

imba215 (2/17/2014)

Hello!

I have two tables with this info:

TABLE 1

COL1 COL2 COL3

AAA BBB CCC

QQQ WWW EEE

AAA SSS DDD

WWW EEE RRR

BBB BBB BBB

TABLE 2

COL1 COL2 COL3 COL4

b b b 343

a a a 344

c c c 345

d d d 346

e e e 347

I want to insert TABLE 1 into TABLE 2 with a query that will auto increment to COL4 looking like this:

COL1 COL2 COL3 COL4

b b b 343

a a a 344

c c c 345

d d d 346

e e e 347

AAA BBB CCC 348

QQQ WWW EEE 349

AAA SSS DDD 350

WWW EEE RRR 351

BBB BBB BBB 352

I know this can be done easily by just altering the column to have an auto-increment datatype, but I currently cannot do that at this moment. Any help on this would be awesome.

Insert into [Table 1] (Col1, Col2, Col3, Col4)

select Col1, Col2, Col3

, (Select max(Col4) from [Table 1]) + ROW_NUMBER() over (Order by [Col1])

from [Table 2]

Russel Loski, MCSE Business Intelligence, Data Platform

Sean Lange SSC Guru Points: 286562 More actions · Answer 2

Something like this?



Insert TableB (col1, col2, col3, col4)

select col1, col2, col3, ROW_NUMBER() over (order by newid()) + (select MAX(col4) from tableb)

from tablea

--edit--

Looks like Russel was typing almost the exact same thing as I was...only faster. 😉

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

John215 SSC Eights! Points: 871 More actions · Answer 3

John215

SSC Eights!

Points: 871

February 17, 2014 at 2:42 pm

#1689664

Thank you so much!

irobertson Hall of Fame Points: 3695 More actions · Answer 4

Interesting this. Intuitively, my head says that pre-calculating the max value should be more efficient when compared with using a correlated subquery:

declare @MaxId int



select @MaxId = max(col4) from #TableB

insert #TableB (Col1, Col2, Col3, Col4)

select Col1, Col2, Col3, row_number() over (order by newid()) + @MaxId

from #TableA

But, testing says otherwise. The variable based approach is significantly worse. I wonder why this is. It's interesting. I guess I'm (simplistically perhaps) expecting the correlated query to have to execute for each row of the main query. But it clearly doesn't.

Test script if you're interested:

-- create some test tables and data



create table #TableA (Col1 varchar(10),Col2 varchar(10),Col3 varchar(10))

create table #TableB (Col1 varchar(10),Col2 varchar(10),Col3 varchar(10),Col4 int)

declare @i int

set @i = 1

-- there are better ways to do this, but this is easy to understand

while @i <=100000

begin

    insert #TableB values ('a','b','c',@i)

    -- only add rows to A for every 10th row added to B

    if @i % 10 = 0

    begin

        insert #TableA values ('AAA','BBB','CCC')

    end

    set @i = @i + 1

end

select count(*) from #TableA

-- 10000

select count(*) from #TableB

-- 100000

set statistics io on

-----------------------

-- test without indexes

-----------------------

-- DON'T DO THIS ON A PRODUCTION SYSTEM

dbcc dropcleanbuffers

dbcc freeproccache

-- correlated subquery version

insert #TableB (Col1, Col2, Col3, Col4)

select Col1, Col2, Col3, row_number() over (order by newid()) + (select max(Col4) from #TableB)

from #TableA

/*

Table '#TableB'. Scan count 1, logical reads 10349, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#TableA'. Scan count 1, logical reads 34, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

*/

-- clear the newly added data

delete #TableB where Col4 > 10000000

dbcc dropcleanbuffers

dbcc freeproccache

-- pre calculated id version

declare @MaxId int

select @MaxId = max(col4) from #TableB

insert #TableB (Col1, Col2, Col3, Col4)

select Col1, Col2, Col3, row_number() over (order by newid()) + @MaxId

from #TableA

/*

Table '#TableB'. Scan count 1, logical reads 349, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#TableB'. Scan count 0, logical reads 10037, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#TableA'. Scan count 1, logical reads 34, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

*/

set statistics io off

-- clear the newly added data

delete #TableB where Col4 > 100000

------------------

-- test with index

------------------

create index ix1 on #TableB(col4)

dbcc dropcleanbuffers

dbcc freeproccache

-- correlated subquery version

insert #TableB (Col1, Col2, Col3, Col4)

select Col1, Col2, Col3, row_number() over (order by newid()) + (select max(Col4) from #TableB)

from #TableA

/*

Table '#TableB'. Scan count 1, logical reads 30194, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#TableA'. Scan count 1, logical reads 34, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

*/

-- clear the newly added data

delete #TableB where Col4 > 100000

dbcc dropcleanbuffers

dbcc freeproccache

-- pre calculated id version

declare @MaxId int

select @MaxId = max(col4) from #TableB

insert #TableB (Col1, Col2, Col3, Col4)

select Col1, Col2, Col3, row_number() over (order by newid()) + @MaxId

from #TableA

/*

Table '#TableB'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#TableB'. Scan count 0, logical reads 30191, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#TableA'. Scan count 1, logical reads 34, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

*/

----------------------------

-- test with clustered index

----------------------------

delete #TableB where Col4 > 100000

drop index #TableB.ix1

create clustered index ix1 on #TableB(col4)

dbcc dropcleanbuffers

dbcc freeproccache

-- correlated subquery version

insert #TableB (Col1, Col2, Col3, Col4)

select Col1, Col2, Col3, row_number() over (order by newid()) + (select max(Col4) from #TableB)

from #TableA

/*

Table '#TableB'. Scan count 1, logical reads 21551, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#TableA'. Scan count 1, logical reads 34, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

*/

-- clear the newly added data

delete #TableB where Col4 > 100000

-- pre calculated id version

declare @MaxId int

select @MaxId = max(col4) from #TableB

insert #TableB (Col1, Col2, Col3, Col4)

select Col1, Col2, Col3, row_number() over (order by newid()) + @MaxId

from #TableA

/*

Table '#TableB'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#TableB'. Scan count 0, logical reads 21548, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#TableA'. Scan count 1, logical reads 34, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

*/

set statistics io off

------------

-- test time

------------

-- run inserts 100 times

create table #results (i int, t char(1), s datetime, e datetime, diff int)

declare @S datetime

    , @e datetime

declare @i int

set @i = 1

while @i <= 100

begin

    declare @MaxId int

    set @S = getdate()   

    -- variable version

    select @MaxId = max(col4) from #TableB

    insert #TableB (Col1, Col2, Col3, Col4)

    select Col1, Col2, Col3, row_number() over (order by newid()) + @MaxId

    from #TableA

    set @e = getdate() 

    insert #results select @i,'V', @S, @e, datediff(ms,@s,@e)

    delete #TableB where Col4 > 100000

    dbcc dropcleanbuffers

    dbcc freeproccache

    set @S = getdate()

    -- correlated subquery version

    insert #TableB (Col1, Col2, Col3, Col4)

    select Col1, Col2, Col3, row_number() over (order by newid()) + (select max(Col4) from #TableB)

    from #TableA

    set @e = getdate() 

    insert #results select @i,'C', @S, @e, datediff(ms,@s,@e)

    delete #TableB where Col4 > 100000

    dbcc dropcleanbuffers

    dbcc freeproccache

    set @i = @i + 1

end

-- show the result

select * from #results

Dwain Camps SSC Guru Points: 86913 More actions · Answer 5

irobertson (2/20/2014)

Interesting this. Intuitively, my head says that pre-calculating the max value should be more efficient when compared with using a correlated subquery:

declare @MaxId int
select @MaxId = max(col4) from #TableB insert #TableB (Col1, Col2, Col3, Col4) select Col1, Col2, Col3, row_number() over (order by newid()) + @MaxId
from #TableA

But, testing says otherwise. The variable based approach is significantly worse. I wonder why this is. It's interesting. I guess I'm (simplistically perhaps) expecting the correlated query to have to execute for each row of the main query. But it clearly doesn't.

I have seen cases where putting the aggregate into a local variable instead of doing it in the main query does indeed help. It depends on whether the optimizer is smart enough to realize that's something it only needs to do once.

I would not have even considered a loop for this, mainly because of my mantra (below).

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St