Move large data from view to permanent table

  • Hello,

    I have been attempting to move a large amount of data from a view into a permanent table.  The permanent table has NO indexes on it currently.  I have been attempting the following:

    insert into perm_table

    select * from myView

     

    My problem is that I run a job that does this overnight, and after 11 hours, the tempDB eats up all available space on the server (FYI... the table is about 60 GB, and the tempDB file grows to about 180 GB).  Nothing else is happening overnight on the server to increase the size of tempDB.

    So, I have 2 questions:

    1)  Why does this insert cause tempDB to grow so big?

    2)  Based on my space constraints, is BCP really the only way to go? 

    My end result is to get this data moved to the permanent table in the quickest time possible, and without running out of space on the server.

    Any help that you all can provide would be GREATLY appreciated.

  • Here are my suggestion:

    1.Put index on your original table.

    2.I would advise that you do devide and conquere strategy because this way you don't use up all your tempdb and the sort.

    3.By having index it should be faster to return your query for data.

    4.Sometimes all it takes is a little bit of patient to get it done.  (You could write up a loop or cursor to go through batches of 500,000 row at a time for example.)

    Good Luck

     

    mom

     

  • Thank you for the feedback.  The data I am trying to move is not in a table, but a view.  I am looking into the possibility of indexing the view, but that is something new to me.

     

    Even if I use the divide and conquer strategy, do you agree BCP would be faster and have the added benefit of not using any tempDB space?

  • I am not sure.  If you BCP out, you have to BCP in unless you do buck insert.  I have never try buck insert because I haven't had a situation in which I need one. 

     

    mom

  • One simple solution is to create indexes on your view. So the view will be materialized (a physical copy of data). SQL server still maintains the data in the view when data are inserted/deleted/updated in its base tables. Another benifit is you do not need to change your SPs and queries that use the view (view name is not changed).

     

  • You need to break the insert into chunks.  Is there something in the view that could be used like a key (doesn't have to be unique), preferably a field that is a clustered key or index on some of the underlying tables?  You could use something like this:

    set

    @minkey = -1 -- Assuming -1 is smaller than any possible value

    while 1=1 begin

    select @maxkey = max(key) from (

            select top 100000 key from myview

            where key > @minkey order by key) x

    if @maxkey is null break

    insert into perm_table

    select * from myview

    where key > @minkey and key <= @maxkey

    set @minkey = @maxkey

    end

    It doesn't matter if there is an index on perm_table, it would only slow down the insert.  You might look at the execution plan for the view however.  The growth in tempdb suggests it is doing a lot of work building hash tables and intermediate sorting.

  • Thanks for all of the feedback.  I could not index the view because it broke a few of the "rules" surrounding indexed views (some columns couldn't be schema bound, there were left outer joins etc).

     

    I decided to create a number of views that "chunk" the data, and then I created a .bat file that performed the BCP in, BCP out, and deletion of the .bcp file.  Worked well, and I believe it was faster and tempDB took much less of a hit.

  • Your view does reference real tables in the SELECT statement of the view.  Look at the JOINs in your view and be sure to have an index on both sides of the JOIN, i.e.,

    SELECT sls.ProdID, prd.ProdName, sls.Qty, ..., ..., and so on..

    FROM dbo.SalesHistory sls

    JOIN dbo.Products prd

     ON prd.ProdID = sls.ProdID

    Make sure you have a ProdID Index on Products and SalesHistory.  You also need to look at your WHERE clause.  If you're filtering for a certain SalesDate, then Index that column as well.

    I'd also look for and avoid any inline functions in your select statement - I'd opt for rewriting the code to get the same results as part of the select statment itself. 

    SELECT ProdName = fnGetProdNameEvenIfItTakesLonger(sls.ProdID), sls.Qty, ..., ...,

    FROM dbo.SalesHistory sls

    If you STILL have problems, then I'd try batching the data as suggested.. or perhaps running the query multiple times for different date ranges, perhaps 6 or 12 months at a time. But as suggested already, do not index the target table until you're done populating it.  Too much overhead will be spent updating the Indices during each INSERT.  Wait till you're done and suffer thru it once.

    Anyone else?  Comments, questions or snide remarks?

    - Mark

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply