data load from on table to another

  • I have 2 tables (more than 10 million rows each).

    table1 has 3 columns

    a - datatype - varchar, b - varchar, c - float

    there is no primary key or index

    there is data in column a,b & all the rows of column c have null value

    table2 has same 3 columns

    a - varchar, b - varchar, c - varchar

    there is no primary key or index

    there is data in column a,b,c...

    I have a simple procedure to put the c value of table2 into c value of table1.

    The procedure looks like this

    ------------------------------------------------------------

    create procedure my_procedure as

    begin

     declare @a nvarchar(255),@b nvarchar(255)

     declare @C nvarchar(255)

     declare c1 cursor dynamic

     for select a,b,c from table2

     open c1

     fetch next from c1 into @a,@b,@c

     while(@@fetch_status = 0)

     begin

      update table1

      set c = @C

      where a = @a and

      b = @b-2

      print @C

      print @a

      fetch next from c1 into @a,@b,@c

      if @@error <> 0

       print 'exception 1 ' + @@error 

     end

     close c1

     deallocate c1

    end

    ------------------------------------------------------------

    It takes arround 17 hrs to update 2000 records, is there a way to optimize this procedure code? (I dont have the rights to add keys to table or index..etc. only code changes)

    Its actually data transfer of column c from table1 to table2


    Regards,

    Srinidhi Rao

  • Hi Srinidhi,

    you are already receiving responses in the original topic where you mentioned the same question.

    http://qa.sqlservercentral.com/forums/shwmessage.aspx?forumid=8&messageid=326295&p=2

    And I agree with Gila that there is little use trying to rewrite the code unless your table with 10M rows is properly indexed. Gila has already posted a set-based solution for your update (based on the info you supplied) which will run quick if the tables are indexed.

    You said you can't change indexes - but you can at least look at them, I suppose... If necessary indexes are missing, contact the person who can create indexes and have an index added where necessary.

  • Why do you need to use a cursor?

    The big 'C' is a big No No in SQL Server.

  • here is a solution that takes 2 minutes and 23 seconds for 1 million records. Obviously the time taken will be exponential for 10 such sets, because the scan size will increase with every insert.

    I can visualize situations where indexing is not feasible, wherein you may want to simultaneously use the same set of tables for updation as well as sequential read access

    Other than that, I am not sure why your DBA is not indexing the tables. Please send him Steve Jones' editorial from yesterdays newsletter.

    --Step 1

    --======

    if exists(select * from sysobjects where name = 't1')

    drop table t1

    create table t1(f1 varchar(10), f2 varchar(10), f3 float)

    if exists(select * from sysobjects where name = 't2')

    drop table t2

    create table t2(f1 varchar(10), f2 varchar(10), f3 varchar(6))

    --Step 2

    --======

    declare @count int

    set @count = 1

    while @count <= 1000000

    begin

        insert into t1 values(cast(@count as varchar(10)),cast(@count + 1 as varchar(10)),null)

        set @count = @count + 1

    end

    --Step 3

    --======

    declare @count int

    set @count = 1

    while @count <= 1000000

    begin

        insert into t2 values(cast(@count as varchar(10)),cast(@count + 1 as varchar(10)),'99.999')

        set @count = @count + 1

    end

    --Step 4

    --======

    --time taken for 1 million records - 2 minutess and 23 seconds

    update t1

    set t1.f3 = cast(t2.f3 as float)

    from t2

    where t1.f1 = t2.f1

    and t2.f2 = t2.f2

  • You can also think of using query to retrieve all records of interest  .. see sample below

    select t1.a, t1.b, t2.c

    from table1 t1 <left> join table2 t2 on

    t1.a = t2.a and t1.b = t2.b

    <Use left join if there is no 1:1 between the tables otherwise strict join will be effective>

    Use BCP to bulk copy the results of query to a file..

    Truncate table table1 and then using BULK Insert you can upload the contents of file to the table table1.

    All the best..

    Ignas Chilewa

    Dublin, OH

  • Now that i have created index of type clustered the preformance of my procedure should increase. But still i am afraid to run the procedure as it took 17 hours to update 2000 matching records. Let me explain the problem very clearly.

    There are 2 tables:

    Table1

    --------------------------------------------------------

    a(varchar)   |    b(varchar)   |     c(varchar)   |    d(float)

    ---------------------------------------------------------

    has value     |   has value     |     null            |   has value

    ---------------------------------------------------------

    There are arround 13 million records in this table.

    The combination of a and b is unique.

    Table2

    ------------------------------------------------

    a(varchar)   |      b(varchar)     |      d(float)

    ------------------------------------------------

    has value    |      has value       |      has value

    -------------------------------------------------

    There are  arround 13 million records in this table.

    The combination of a and b is unique.

    I have created Clustered index on the column a of both tables(note: column a has more repeating values)

    I have to transfer the value in column d in Table2 to column c of Table1

     


    Regards,

    Srinidhi Rao

  • this should still work

    --Step 4

    --======

    --time taken for 1 million records - 2 minutess and 23 seconds

    update t1

    set t1.f3 = cast(t2.f3 as float)

    from t2

    where t1.f1 = t2.f1

    and t2.f2 = t2.f2

  • "But still i am afraid to run the procedure as it took 17 hours to update 2000 matching records. "

    You're saying that there are only about 2000 common rows on the millions of records?

    This should be faster...

    First, Create a composite index on a and b for both tables.

    ------------

    -- Assumming b has less repeating values (reverse the columns otherwise)

    Create index IDX_1 on Table1(b,a)

    GO

    Create index IDX_1 on Table2(b,a)

    GO

    -- Use #Temp table to store matching rows (about 2000 only)

    Insert into #Temp

    Select T1.a, T1.b

    From Table1 T1

    Join Table2 T2

    ON T1.a = T2.a and T1.b = T2.b

    -- use #Temp to limit the selection to be updated

    Update T1

    Set T1.c = T2.d

    From Table1 T1

    Join Table2 T2

    ON T1.a = T2.a and T1.b = T2.b

    Join #Temp T 

    ON T1.a = T.a and T1.b = T.b

    GO

    drop table #Temp

     

    -- Note: I'm assumming that the execution plan for Jambu's statement will use a Hash Match algorithm since columns a and b are varchars. This will fill up your memory, thereby resulting to the slow response time.

     


    Ronald San Juan | SQL DBA
    ID 710124 ~ Code "Northwind"

  • actually

    "But still i am afraid to run the procedure as it took 17 hours to update 2000 matching records. "

    i stopped executing the procedure in between!

    Thanks !!!!!!!!!

    creating the index has definitly increased the speed....now the update is happening in minutes!...

    Now that i have created clustered index....i will let u know the exact time taken asap..

    FYI....I created index INDX1 only on column a ....

    Will the creating of index INDX2 on column b improve the performance further?

     

     


    Regards,

    Srinidhi Rao

  • yes indexing on the second column will increase speed.

    In fact I concatenated both the columns and created a new column in the table and used that new column for the comparison, and it was way faster than the two column indexing.

     

    I also removed the conversion, because I saw there is an implicit conversion happening for f3

  • creating a clustered index on column a took 14 hrs to update arround 1.5 million rows.

    The message appears when created a index on (a,b): (a has more repeating values)

    "Warning! The maximum key length is 900 bytes.

    The index '

    IDX_1' has maximum length of 1020 bytes.

    For some combination of large values, the insert/update operation will fail."

    now created a index on 'b' and  clustered index on 'a' of both the tables


    Regards,

    Srinidhi Rao

  • Srinidhi,

    Sorry to get into sermon mode. But...

    The job at hand is one of huge volume.

    To get to a viable solution, we need to understand

    the data in and out. Till your last post I did not

    have any clue that field1 and field2 together could be

    1020 bytes. Your first post declares 255 byte variables, and

    I thought that is what it is.

    SQL Server Index length is 900 bytes. You cannot have a

    varchar(901) and declare it as a PK or try to index such a column.

    Even now it is not sure what is the size of your varchar fields

    Is it more than 900. In that case even your latest solution could

    fail because there is no guarantee that such large size data may

    not come in at some point in time.

    Because you did not get the warning when you indexed each field indivudually

    I am assuming the field lengths are less than  or equal to 900

    Some of the possible alternatives I may look for are:

    1. Can I use the files as is rather than updating every time.

    2. If all files are needed for queries, can We look at Horizontal Partitioning

        as an alternative. SQL 2000 Views help/optimize in a nice way for Horizontal

        partitioning.

    3. Can I use some kind of a distributed mechanism.

    If it is a real world issue you are facing in your company, please try to discuss it

    with some db gurus who have experience handling such voluminous data.

    This does not seem to be a situation where we can apply some adhoc solution.

    It could prove disastrous. These are the kinds of situations (where each iteration

    of the test could cost you a day) Steve says could cost us our jobs as DBAs.

    It is a nice problem you brought up. For me it looks like an application for mainframes.

    Are you on a mainframe by any chance. I am not even sure

    if SQL Server has a mainframe version.

    I wish I could help you. But it is taking so much time to test each time, and I have

    my own job to do.

    But keep updating your experiences. It is an excellent problem.

     

     

  • Your problem may not just be in SQL Server. It shouldn't take 14hrs for 1.5million rows only. Can you give us your server specifications please? I'm thinking you might be low on resources (i.e. memory).

    My post on using a temp table should help out. It allows you to use the tempdb to store the common values instead of keeping them in memory when SQL server performs the join operation.


    Ronald San Juan | SQL DBA
    ID 710124 ~ Code "Northwind"

  • i agree with Ronald San Juan

    There is something weird going on in your env

  • hey ...it took 40 mins to update....i have created clustered index on column a and primary index on column b....and the procedure is just

    create procedure my_procedure as

    update table1 t1

    set c = t2.c from table2 t2

    where a = t2.a and

    b = t2.b


    Regards,

    Srinidhi Rao

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply