Update Process Very Slow with big table and few updated records

  • OK - I am new to SQL Server and have been in Access for years.

    Tables - Invoice header - about 8 million records.  One field is an open/closed field - small int.  Open = 1, closed = 0.  I moved from bit to small int due to Access using 1 and Sql Server using -1 for true. 

    Other table is the "open invoices" table which has about 175,000 records.

    This is not a transactional database, but one for analysis.  Nightly transfer of data via text files. 

    So new invoices are moved into the database and they are by definition "open" because they were invoiced that night.  Then during the day customers pay so some of the other 8 million invoices move from open to closed. 

    What I had been doing in Access was one process to close all invoices.  Then the next process was to set to open only those records that matched the OpenInvoice table.  This didn't take too long in Access, but it only had a subset of the 8 million invoices.  This is why we are moving to Sql Server, so we can see and query all invoices at once, not one year at a time. 

    A similar process in Sql Server is taking about 17 minutes.   Is there any thing I can do to speed it up? 

    I had an index on the field "open" but after reading this week's bulletin I saw that Sql doesn't use any index that is basically not close to unique.  And in this case there are 7 million some records marked closed and a handful in comparison marked open.  I was hoping this would take a few seconds, not several minutes.

    Simplified table structure - Invoice Header

    Invoice - 11 characters

    OpenFlag - Small Int

    Simplified table structure - Invoice Open

    Invoice - 11 characters 

     

  • Do you have any clustered indexes on the tables to prevent fragmentation ?

    Does the UPDATE statement join on the "Invoice" column ? If so, is that column indexed in either table ?

    Also, post the SQL you're using for the UPDATE. The issue may lie in the syntax you're using if translated from MS-Access.

     

  • Thank you so much for being there for me.

    Statements to set all invoices to zero (closed):

    UPDATE dbo.InvoiceHeader SET dbo.InvoiceHeader.[Open] = 0

    WHERE (((dbo.InvoiceHeader.[Open])=1))

    this took 6.46 minutes - when I removed the where clause, thinking it was doing nothing, I was at 27 minutes and gave up on it.  So the where clause is doing something.

    Statements to reset open invoices:

    UPDATE dbo.InvoiceHeader SET dbo.InvoiceHeader.[Open] = 1 from dbo.InvoiceHeader INNER JOIN dbo.InvoiceOpen ON dbo.InvoiceHeader.Invoice=dbo.InvoiceOpen.Invoice  WHERE (((dbo.InvoiceOpen.[Type])='I'))

    This is running at 18 minutes and still counting.  Double what it was before I deleted some indexes and did a reindexing process.   I had an index on the OPEN field and just deleted it and it seems the updates are slower without it.

    The Type could also be a "D" or "C" for debits and credits which are not part of what we are doing.  I will try deleting these records before the update so I don't need the where clause.

    Each table has a clustered index. 

     

     

     

  • Sorry - last part of question.  INVOICE is indexed in both tables.

  • I have a possible suggestion for a change in your process, but I need to have one question answered first: Is it possible for a CLOSED invoice to be REOPENED?

     

  • Thank you.  Yes a suggested alternative to the process is most welcome.

    No an invoice is never reopened.  If it is paid with a bounced check, etc (it happens) they do something klugy in AR. 

  • I changed the below update:

    UPDATE dbo.InvoiceHeader SET dbo.InvoiceHeader.[Open] = 1 from dbo.InvoiceHeader INNER JOIN dbo.InvoiceOpen ON dbo.InvoiceHeader.Invoice=dbo.InvoiceOpen.Invoice  WHERE (((dbo.InvoiceOpen.[Type])='I'))

    I eliminated the InvoiceOpen type field so I didn't need the where clause and the update accomplished in 7.11 minutes which maybe isn't so horrible when I consider it's done in the middle of the night.  But any suggestions for improvement would be most welcome.

    My observations, an index on the Open Flag does make a difference.  I don't know why.  And changing just the OPEN's to closed is much faster than closing all. 

    I feel like I am getting there. 

  • The index might be usefull if only an index scan is required (without a bookmark lookup).  It will have to scan the same amount of ROWS, but with only 2-3 columns instead of the whole row... Make much less that to read through.

     

    That's why index scans are not always such a bad thing... especially in the case of covering indexes.

  • I'm surprised this hasn't been asked already, but why are you setting all open invoices to closed and then resetting many of them back to open? Just set the ones to closed that need to be closed.

    Break the process up into multiple parts and then perform one update statement:

    Declare @InvoiceIDs Table (Invoice int not null primary key)

    -- hopefully your unique key is an int or something really small. Change data type if not int

    Insert Into @InvoiceIDs

    Select Invoice

    From dbo.InvoiceHeader As IH with(nolock)

    Where [Open] = 1

    And Not Exists (Select 1 From dbo.InvoiceOpen

        Where [Type] = 'I'

        And Invoice = IH.Invoice)

    Update IH

    Set IH.[Open] = 0

    From dbo.InvoiceHeader As IH

    Inner Join @InvoiceIDs tIH On tIH.Invoice = IH.Invoice

    Suggested indexes:

    InvoiceHeader.[Open]

    InvoiceHeader.Invoice

    InvoiceHeader.[Open], InvoiceHeader.Invoice

    InvoiceOpen.[Type], InvoiceOpen.Invoice

     


    My blog: SQL Soldier[/url]
    SQL Server Best Practices:
    SQL Server Best Practices
    Twitter: @SQLSoldier
    My book: Pro SQL Server 2008 Mirroring[/url]
    Microsoft Certified Master: SQL Server, Data Platform MVP
    Database Engineer at BlueMountain Capital Management[/url]

  • If you can't solve the problem with indexes, or the two-step process Robert proposed, you might consider using a loop that updates a limited number of rows in each cycle.  Very large updates put a lot of strain on the server as a long-running transaction that holds a lot of locks and a lot of uncommitted data in the transaction logs.  The definition of "very large" is fuzzy and depends on your hardware, data, and other factors, but I would say a duration of an hour or more probably qualifies.

    Note that breaking a single transaction into a series of smaller transactions can produce inconsistent results if the data is very dynamic, but you say this is static reporting data so it should be safe. 

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply