Update Process Very Slow with big table and few updated records

Question

Post reply

Update Process Very Slow with big table and few updated records

Carolyn Stern

Right there with Babe

Points: 767
More actions
January 9, 2007 at 1:13 pm

#176423

OK - I am new to SQL Server and have been in Access for years.

Tables - Invoice header - about 8 million records. One field is an open/closed field - small int. Open = 1, closed = 0. I moved from bit to small int due to Access using 1 and Sql Server using -1 for true.

Other table is the "open invoices" table which has about 175,000 records.

This is not a transactional database, but one for analysis. Nightly transfer of data via text files.

So new invoices are moved into the database and they are by definition "open" because they were invoiced that night. Then during the day customers pay so some of the other 8 million invoices move from open to closed.

What I had been doing in Access was one process to close all invoices. Then the next process was to set to open only those records that matched the OpenInvoice table. This didn't take too long in Access, but it only had a subset of the 8 million invoices. This is why we are moving to Sql Server, so we can see and query all invoices at once, not one year at a time.

A similar process in Sql Server is taking about 17 minutes. Is there any thing I can do to speed it up?

I had an index on the field "open" but after reading this week's bulletin I saw that Sql doesn't use any index that is basically not close to unique. And in this case there are 7 million some records marked closed and a handful in comparison marked open. I was hoping this would take a few seconds, not several minutes.

Simplified table structure - Invoice Header

Invoice - 11 characters

OpenFlag - Small Int

Simplified table structure - Invoice Open

Invoice - 11 characters

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

PW-201837 SSC-Insane Points: 20805 More actions · Answer 1

PW-201837

SSC-Insane

Points: 20805

January 9, 2007 at 1:19 pm

#682170

Do you have any clustered indexes on the tables to prevent fragmentation ?

Does the UPDATE statement join on the "Invoice" column ? If so, is that column indexed in either table ?

Also, post the SQL you're using for the UPDATE. The issue may lie in the syntax you're using if translated from MS-Access.

Carolyn Stern Right there with Babe Points: 767 More actions · Answer 2

Thank you so much for being there for me.

Statements to set all invoices to zero (closed):

UPDATE dbo.InvoiceHeader SET dbo.InvoiceHeader.[Open] = 0

WHERE (((dbo.InvoiceHeader.[Open])=1))

this took 6.46 minutes - when I removed the where clause, thinking it was doing nothing, I was at 27 minutes and gave up on it. So the where clause is doing something.

Statements to reset open invoices:

UPDATE dbo.InvoiceHeader SET dbo.InvoiceHeader.[Open] = 1 from dbo.InvoiceHeader INNER JOIN dbo.InvoiceOpen ON dbo.InvoiceHeader.Invoice=dbo.InvoiceOpen.Invoice WHERE (((dbo.InvoiceOpen.[Type])='I'))

This is running at 18 minutes and still counting. Double what it was before I deleted some indexes and did a reindexing process. I had an index on the OPEN field and just deleted it and it seems the updates are slower without it.

The Type could also be a "D" or "C" for debits and credits which are not part of what we are doing. I will try deleting these records before the update so I don't need the where clause.

Each table has a clustered index.

Carolyn Stern Right there with Babe Points: 767 More actions · Answer 3

Carolyn Stern

Right there with Babe

Points: 767

January 9, 2007 at 2:43 pm

#682197

Sorry - last part of question. INVOICE is indexed in both tables.

Lynn Pettis SSC Guru Points: 442447 More actions · Answer 4

Lynn Pettis

SSC Guru

Points: 442447

January 9, 2007 at 3:23 pm

#682212

I have a possible suggestion for a change in your process, but I need to have one question answered first: Is it possible for a CLOSED invoice to be REOPENED?

Carolyn Stern Right there with Babe Points: 767 More actions · Answer 5

Carolyn Stern

Right there with Babe

Points: 767

January 9, 2007 at 3:26 pm

#682216

Thank you. Yes a suggested alternative to the process is most welcome.

No an invoice is never reopened. If it is paid with a bounced check, etc (it happens) they do something klugy in AR.

Carolyn Stern Right there with Babe Points: 767 More actions · Answer 6

I changed the below update:

UPDATE dbo.InvoiceHeader SET dbo.InvoiceHeader.[Open] = 1 from dbo.InvoiceHeader INNER JOIN dbo.InvoiceOpen ON dbo.InvoiceHeader.Invoice=dbo.InvoiceOpen.Invoice WHERE (((dbo.InvoiceOpen.[Type])='I'))

I eliminated the InvoiceOpen type field so I didn't need the where clause and the update accomplished in 7.11 minutes which maybe isn't so horrible when I consider it's done in the middle of the night. But any suggestions for improvement would be most welcome.

My observations, an index on the Open Flag does make a difference. I don't know why. And changing just the OPEN's to closed is much faster than closing all.

I feel like I am getting there.

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 7

The index might be usefull if only an index scan is required (without a bookmark lookup). It will have to scan the same amount of ROWS, but with only 2-3 columns instead of the whole row... Make much less that to read through.

That's why index scans are not always such a bad thing... especially in the case of covering indexes.

Robert Davis One Orange Chip Points: 28027 More actions · Answer 8

I'm surprised this hasn't been asked already, but why are you setting all open invoices to closed and then resetting many of them back to open? Just set the ones to closed that need to be closed.

Break the process up into multiple parts and then perform one update statement:

Declare @InvoiceIDs Table (Invoice int not null primary key)

-- hopefully your unique key is an int or something really small. Change data type if not int

Insert Into @InvoiceIDs

Select Invoice

From dbo.InvoiceHeader As IH with(nolock)

Where [Open] = 1

And Not Exists (Select 1 From dbo.InvoiceOpen

Where [Type] = 'I'

And Invoice = IH.Invoice)

Update IH

Set IH.[Open] = 0

From dbo.InvoiceHeader As IH

Inner Join @InvoiceIDs tIH On tIH.Invoice = IH.Invoice

Suggested indexes:

InvoiceHeader.[Open]

InvoiceHeader.Invoice

InvoiceHeader.[Open], InvoiceHeader.Invoice

InvoiceOpen.[Type], InvoiceOpen.Invoice

My blog: SQL Soldier[/url]
SQL Server Best Practices: SQL Server Best Practices
Twitter: @SQLSoldier
My book: Pro SQL Server 2008 Mirroring[/url]
Microsoft Certified Master: SQL Server, Data Platform MVP
Database Engineer at BlueMountain Capital Management[/url]

Scott Coleman One Orange Chip Points: 27451 More actions · Answer 9

If you can't solve the problem with indexes, or the two-step process Robert proposed, you might consider using a loop that updates a limited number of rows in each cycle. Very large updates put a lot of strain on the server as a long-running transaction that holds a lot of locks and a lot of uncommitted data in the transaction logs. The definition of "very large" is fuzzy and depends on your hardware, data, and other factors, but I would say a duration of an hour or more probably qualifies.

Note that breaking a single transaction into a series of smaller transactions can produce inconsistent results if the data is very dynamic, but you say this is static reporting data so it should be safe.