data load from on table to another

Question

Post reply

data load from on table to another

SRI-385740

SSC Veteran

Points: 296
More actions
November 30, 2006 at 3:56 am

#115193

I have 2 tables (more than 10 million rows each).

table1 has 3 columns

a - datatype - varchar, b - varchar, c - float

there is no primary key or index

there is data in column a,b & all the rows of column c have null value

table2 has same 3 columns

a - varchar, b - varchar, c - varchar

there is no primary key or index

there is data in column a,b,c...

I have a simple procedure to put the c value of table2 into c value of table1.

The procedure looks like this

------------------------------------------------------------

create procedure my_procedure as

begin

declare @a nvarchar(255),@b nvarchar(255)

declare @C nvarchar(255)

declare c1 cursor dynamic

for select a,b,c from table2

open c1

fetch next from c1 into @a,@b,@c

while(@@fetch_status = 0)

begin

update table1

set c = @C

where a = @a and

b = @b-2

print @C

print @a

fetch next from c1 into @a,@b,@c

if @@error <> 0

print 'exception 1 ' + @@error

end

close c1

deallocate c1

end

------------------------------------------------------------

It takes arround 17 hrs to update 2000 records, is there a way to optimize this procedure code? (I dont have the rights to add keys to table or index..etc. only code changes)

Its actually data transfer of column c from table1 to table2

Regards,

Srinidhi Rao

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply

Vladan SSC-Insane Points: 22072 More actions · Answer 1

Hi Srinidhi,

you are already receiving responses in the original topic where you mentioned the same question.

http://qa.sqlservercentral.com/forums/shwmessage.aspx?forumid=8&messageid=326295&p=2

And I agree with Gila that there is little use trying to rewrite the code unless your table with 10M rows is properly indexed. Gila has already posted a set-based solution for your update (based on the info you supplied) which will run quick if the tables are indexed.

You said you can't change indexes - but you can at least look at them, I suppose... If necessary indexes are missing, contact the person who can create indexes and have an index added where necessary.

Loner SSC-Insane Points: 21279 More actions · Answer 2

Loner

SSC-Insane

Points: 21279

November 30, 2006 at 8:04 am

#674936

Why do you need to use a cursor?

The big 'C' is a big No No in SQL Server.

Jambu Krishnamurthy SSCrazy Points: 2054 More actions · Answer 3

here is a solution that takes 2 minutes and 23 seconds for 1 million records. Obviously the time taken will be exponential for 10 such sets, because the scan size will increase with every insert.

I can visualize situations where indexing is not feasible, wherein you may want to simultaneously use the same set of tables for updation as well as sequential read access

Other than that, I am not sure why your DBA is not indexing the tables. Please send him Steve Jones' editorial from yesterdays newsletter.

--Step 1

--======

if exists(select * from sysobjects where name = 't1')

drop table t1

create table t1(f1 varchar(10), f2 varchar(10), f3 float)

if exists(select * from sysobjects where name = 't2')

drop table t2

create table t2(f1 varchar(10), f2 varchar(10), f3 varchar(6))

--Step 2

--======

declare @count int

set @count = 1

while @count <= 1000000

begin

insert into t1 values(cast(@count as varchar(10)),cast(@count + 1 as varchar(10)),null)

set @count = @count + 1

end

--Step 3

--======

declare @count int

set @count = 1

while @count <= 1000000

begin

insert into t2 values(cast(@count as varchar(10)),cast(@count + 1 as varchar(10)),'99.999')

set @count = @count + 1

end

--Step 4

--======

--time taken for 1 million records - 2 minutess and 23 seconds

update t1

set t1.f3 = cast(t2.f3 as float)

from t2

where t1.f1 = t2.f1

and t2.f2 = t2.f2

Ignas Chilewa Mr or Mrs. 500 Points: 516 More actions · Answer 4

You can also think of using query to retrieve all records of interest .. see sample below

select t1.a, t1.b, t2.c

from table1 t1 <left> join table2 t2 on

t1.a = t2.a and t1.b = t2.b

Use BCP to bulk copy the results of query to a file..

Truncate table table1 and then using BULK Insert you can upload the contents of file to the table table1.

All the best..

Ignas Chilewa

Dublin, OH

SRI-385740 SSC Veteran Points: 296 More actions · Answer 5

Now that i have created index of type clustered the preformance of my procedure should increase. But still i am afraid to run the procedure as it took 17 hours to update 2000 matching records. Let me explain the problem very clearly.

There are 2 tables:

Table1

--------------------------------------------------------

a(varchar) | b(varchar) | c(varchar) | d(float)

---------------------------------------------------------

has value | has value | null | has value

---------------------------------------------------------

There are arround 13 million records in this table.

The combination of a and b is unique.

Table2

------------------------------------------------

a(varchar) | b(varchar) | d(float)

------------------------------------------------

has value | has value | has value

-------------------------------------------------

There are arround 13 million records in this table.

The combination of a and b is unique.

I have created Clustered index on the column a of both tables(note: column a has more repeating values)

I have to transfer the value in column d in Table2 to column c of Table1

Regards,

Srinidhi Rao

Jambu Krishnamurthy SSCrazy Points: 2054 More actions · Answer 6

Jambu Krishnamurthy

SSCrazy

Points: 2054

December 5, 2006 at 8:44 am

#675894

this should still work

--Step 4

--======

--time taken for 1 million records - 2 minutess and 23 seconds

update t1

set t1.f3 = cast(t2.f3 as float)

from t2

where t1.f1 = t2.f1

and t2.f2 = t2.f2

Ronald San Juan SSCrazy Points: 2937 More actions · Answer 7

"But still i am afraid to run the procedure as it took 17 hours to update 2000 matching records. "

You're saying that there are only about 2000 common rows on the millions of records?

This should be faster...

First, Create a composite index on a and b for both tables.

------------

-- Assumming b has less repeating values (reverse the columns otherwise)

Create index IDX_1 on Table1(b,a)

GO

Create index IDX_1 on Table2(b,a)

GO

-- Use #Temp table to store matching rows (about 2000 only)

Insert into #Temp

Select T1.a, T1.b

From Table1 T1

Join Table2 T2

ON T1.a = T2.a and T1.b = T2.b

-- use #Temp to limit the selection to be updated

Update T1

Set T1.c = T2.d

From Table1 T1

Join Table2 T2

ON T1.a = T2.a and T1.b = T2.b

Join #Temp T

ON T1.a = T.a and T1.b = T.b

GO

drop table #Temp

-- Note: I'm assumming that the execution plan for Jambu's statement will use a Hash Match algorithm since columns a and b are varchars. This will fill up your memory, thereby resulting to the slow response time.

Ronald San Juan | SQL DBA
ID 710124 ~ Code "Northwind"

SRI-385740 SSC Veteran Points: 296 More actions · Answer 8

actually

"But still i am afraid to run the procedure as it took 17 hours to update 2000 matching records. "

i stopped executing the procedure in between!

Thanks !!!!!!!!!

creating the index has definitly increased the speed....now the update is happening in minutes!...

Now that i have created clustered index....i will let u know the exact time taken asap..

FYI....I created index INDX1 only on column a ....

Will the creating of index INDX2 on column b improve the performance further?

Regards,

Srinidhi Rao

Jambu Krishnamurthy SSCrazy Points: 2054 More actions · Answer 9

yes indexing on the second column will increase speed.

In fact I concatenated both the columns and created a new column in the table and used that new column for the comparison, and it was way faster than the two column indexing.

I also removed the conversion, because I saw there is an implicit conversion happening for f3

SRI-385740 SSC Veteran Points: 296 More actions · Answer 10

creating a clustered index on column a took 14 hrs to update arround 1.5 million rows.

The message appears when created a index on (a,b): (a has more repeating values)

"Warning! The maximum key length is 900 bytes.

The index '

IDX_1' has maximum length of 1020 bytes.

For some combination of large values, the insert/update operation will fail."

now created a index on 'b' and clustered index on 'a' of both the tables

Regards,

Srinidhi Rao

Jambu Krishnamurthy SSCrazy Points: 2054 More actions · Answer 11

Srinidhi,

Sorry to get into sermon mode. But...

The job at hand is one of huge volume.

To get to a viable solution, we need to understand

the data in and out. Till your last post I did not

have any clue that field1 and field2 together could be

1020 bytes. Your first post declares 255 byte variables, and

I thought that is what it is.

SQL Server Index length is 900 bytes. You cannot have a

varchar(901) and declare it as a PK or try to index such a column.

Even now it is not sure what is the size of your varchar fields

Is it more than 900. In that case even your latest solution could

fail because there is no guarantee that such large size data may

not come in at some point in time.

Because you did not get the warning when you indexed each field indivudually

I am assuming the field lengths are less than or equal to 900

Some of the possible alternatives I may look for are:

1. Can I use the files as is rather than updating every time.

2. If all files are needed for queries, can We look at Horizontal Partitioning

as an alternative. SQL 2000 Views help/optimize in a nice way for Horizontal

partitioning.

3. Can I use some kind of a distributed mechanism.

If it is a real world issue you are facing in your company, please try to discuss it

with some db gurus who have experience handling such voluminous data.

This does not seem to be a situation where we can apply some adhoc solution.

It could prove disastrous. These are the kinds of situations (where each iteration

of the test could cost you a day) Steve says could cost us our jobs as DBAs.

It is a nice problem you brought up. For me it looks like an application for mainframes.

Are you on a mainframe by any chance. I am not even sure

if SQL Server has a mainframe version.

I wish I could help you. But it is taking so much time to test each time, and I have

my own job to do.

But keep updating your experiences. It is an excellent problem.

Ronald San Juan SSCrazy Points: 2937 More actions · Answer 12

Your problem may not just be in SQL Server. It shouldn't take 14hrs for 1.5million rows only. Can you give us your server specifications please? I'm thinking you might be low on resources (i.e. memory).

My post on using a temp table should help out. It allows you to use the tempdb to store the common values instead of keeping them in memory when SQL server performs the join operation.

Ronald San Juan | SQL DBA
ID 710124 ~ Code "Northwind"