Query tuning

Question

Post reply

Query tuning

California

SSCommitted

Points: 1888
More actions
November 30, 2007 at 12:46 pm

#180872

I have this fat query runnig against our production database and taking huge resources to execute:

At the query execution, i noticed from task manager - isql taking 1GB of memory and the CPU utilization is 99% for this session (from where the query is ran).

I was wondering, if someone please take a look at this query and suggest, what can be done to make it run better. The SQL version is 2000 and server is configured with 10 GB of memory. The hardware platform is quiet robust and have multiple file groups defined on the database.

Query is as follows:

SELECT dimRptBasis.SEQ1, dimDataSrc.SEQ4, dimDataSrc.SEQ3, dimDataSrc.SEQ2, dimDataSrc.SEQ1, dimEmployee.SEQ2, dimEmployee.SEQ1, dimAccount_Expense.SEQ5, dimAccount_Expense.SEQ4, dimAccount_Expense.SEQ3, dimAccount_Expense.SEQ2, dimAccount_Expense.SEQ1, dimBusOrg.SEQ11, dimBusOrg.SEQ10, dimBusOrg.SEQ9, dimBusOrg.SEQ8, dimBusOrg.SEQ7, dimBusOrg.SEQ6, dimBusOrg.SEQ5, dimBusOrg.SEQ4, dimBusOrg.SEQ3, dimBusOrg.SEQ2, dimBusOrg.SEQ1, dimBusPartner.SEQ3, dimBusPartner.SEQ2, dimBusPartner.SEQ1,

dimCostCenter_Expense.SEQ5, dimCostCenter_Expense.SEQ4, dimCostCenter_Expense.SEQ3, dimCostCenter_Expense.SEQ2, dimCostCenter_Expense.SEQ1, dimTime.TIMEKEYNAME1, dimTime.TIMEKEYNAME2, dimTime.TIMEKEYNAME3, dimCategory.SEQ1,

dimProduct.SEQ4, dimProduct.SEQ3, dimProduct.SEQ2, dimProduct.SEQ1, dimProject.SEQ2, dimProject.SEQ1, tblFACTExpense.SIGNEDDATA FROM "dbo"."tblFactExpense", "dimRptBasis", "dimDataSrc", "dimEmployee",

"dimAccount_Expense", "dimBusOrg", "dimBusPartner", "dimCostCenter_Expense", "dimTime", "dimCategory", "dimProduct", "dimProject" WHERE ("dbo"."tblFACTExpense"."RptBasis"="dimRptBasis"."ID1")

AND ("dbo"."tblFACTExpense"."DataSrc"="dimDataSrc"."ID1") AND

("dbo"."tblFACTExpense"."Employee"="dimEmployee"."ID1") AND

("dbo"."tblFACTExpense"."Account_Expense"="dimAccount_Expense"."ID1") AND

("dbo"."tblFACTExpense"."BusOrg"="dimBusOrg"."ID1") AND

("dbo"."tblFACTExpense"."BusPartner"="dimBusPartner"."ID1") AND

("dbo"."tblFACTExpense"."CostCenter_Expense"="dimCostCenter_Expense"."ID1")

AND ("dbo"."tblFACTExpense"."TIMEID"="dimTime"."TIMEID") AND

("dbo"."tblFACTExpense"."Category"="dimCategory"."ID1") AND

("dbo"."tblFACTExpense"."Product"="dimProduct"."ID1") AND ("dbo"."tblFACTExpense"."Project"="dimProject"."ID1")

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply

noeld SSC Guru Points: 96590 More actions · Answer 1

noeld

SSC Guru

Points: 96590

November 30, 2007 at 1:06 pm

#756122

First: Make sure you take a look at the execution plan and you are seen no table scans...

Second: Check that you are actually retrieving what you need and no more.

... that will get you started ....

* Noel

Antares686 SSC Guru Points: 125444 More actions · Answer 2

After looking I cannot see anything glaring as I don't know enough however the first thing I would do is write this way and see if improves performance. But the other suggestions is def next step.



SELECT

dimRptBasis.SEQ1, dimDataSrc.SEQ4, dimDataSrc.SEQ3,

dimDataSrc.SEQ2, dimDataSrc.SEQ1, dimEmployee.SEQ2,

dimEmployee.SEQ1, dimAccount_Expense.SEQ5,

dimAccount_Expense.SEQ4, dimAccount_Expense.SEQ3,

dimAccount_Expense.SEQ2, dimAccount_Expense.SEQ1,

dimBusOrg.SEQ11, dimBusOrg.SEQ10, dimBusOrg.SEQ9, dimBusOrg.SEQ8,

dimBusOrg.SEQ7, dimBusOrg.SEQ6, dimBusOrg.SEQ5, dimBusOrg.SEQ4,

dimBusOrg.SEQ3, dimBusOrg.SEQ2, dimBusOrg.SEQ1, dimBusPartner.SEQ3,

dimBusPartner.SEQ2, dimBusPartner.SEQ1,

dimCostCenter_Expense.SEQ5, dimCostCenter_Expense.SEQ4,

dimCostCenter_Expense.SEQ3, dimCostCenter_Expense.SEQ2,

dimCostCenter_Expense.SEQ1, dimTime.TIMEKEYNAME1,

dimTime.TIMEKEYNAME2, dimTime.TIMEKEYNAME3, dimCategory.SEQ1,

dimProduct.SEQ4, dimProduct.SEQ3, dimProduct.SEQ2,

dimProduct.SEQ1, dimProject.SEQ2, dimProject.SEQ1,

tblFACTExpense.SIGNEDDATA

FROM

	dbo.tblFactExpense

INNER JOIN

	dbo.dimRptBasis

ON

	dbo.tblFACTExpense.RptBasis = dimRptBasis.ID1

INNER JOIN

	dbo.dimDataSrc

ON

	dbo.tblFACTExpense.DataSrc = dimDataSrc.ID1

INNER JOIN

	dbo.dimEmployee

ON

	dbo.tblFACTExpense.Employee = dimEmployee.ID1

INNER JOIN

	dbo.dimAccount_Expense

ON

	dbo.tblFACTExpense.Account_Expense = dimAccount_Expense.ID1

INNER JOIN

	dbo.dimBusOrg

ON

	dbo.tblFACTExpense.BusOrg = dimBusOrg.ID1

INNER JOIN

	dbo.dimBusPartner

ON

	dbo.tblFACTExpense.BusPartner = dimBusPartner.ID1

INNER JOIN

	dbo.dimCostCenter_Expense

ON

	dbo.tblFACTExpense.CostCenter_Expense = dimCostCenter_Expense.ID1

INNER JOIN

	dbo.dimTime

ON

	dbo.tblFACTExpense.TIMEID = dimTime.TIMEID

INNER JOIN

	dbo.dimCategory

ON

	dbo.tblFACTExpense.Category = dimCategory.ID1

INNER JOIN

	dbo.dimProduct

ON

	dbo.tblFACTExpense.Product = dimProduct.ID1

INNER JOIN

	dbo.dimProject

ON

dbo.tblFACTExpense.Project = dimProject.ID1

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 3

Even if it's "auto-generated" code, someone someday is going to have to troubleshoot it... take some time out to format it! And, the use of table aliases would clear a great deal of the clutter as would removing the unnecessary double quotes and parenthesis from this, well, mess! Following a standard as to when to "break" a line would be a huge help, as well.

Since most of the table names begin with "dim", I can only assume that this is for some sort of OLAP processing so I don't know if everything I say in the following is 100% true.

1. I checked all of the joins... they are all viable and no cross-joins seem to be present. However, you have to make sure that ID1 is unique in the "dim" tables are you're gonna spawn a lot of rows that you hadn't inteneded to spawn almost as bad as if mini-cross-joins were built in.

2. Take Noeld's advice... at least run an estimated execution plan on it. Look for table scans and try to figure out some indexes that would help.

3. Try using the Index Tuning Wizard on this bad boy.

4. Make sure that all the tables have a Primary Key... I'm thinking that will probably be on the "ID1" column of most of your tables... If so, you may want to make it a CLUSTERED primary key.

5. Since I think the "dim" tables are for OLAP, when you create your indexes and primary key (especially if clustered), you might wanna try using a Fill Factor of 100... no sense searching blank space if you don't have to.

6. There's a pretty good chance of this building a rather large working table in TempDB... make sure that TempDB is the proper size to handle the working table and all the other casual traffic TempDB is going to see.

7. Unless you are using it for some type of filtering, I see no "dimReportBasis" columns in the SELECT list... are you 100% sure you need to join to it?

Please don't post any more unformatted code of this length... next time, I'm not even going to look at something like this. If you want help, help us... make the code easy to read. If you really don't care what it looks like, then we won't either... we'll just go to the next post 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

California SSCommitted Points: 1888 More actions · Answer 4

California

SSCommitted

Points: 1888

November 30, 2007 at 6:18 pm

#756193

Would like to thank everyone for there suggestions. I started looking into this and keep you posted, if i find anything intresting.

With regards to the post, please accept my sincere apology for the unformated code. i have taken a note here and promise this wont happen again.

Thanks again,

Cali

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 5

Thanks, Cali. I appreciate the feedback.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

California SSCommitted Points: 1888 More actions · Answer 6

I have some updates to share with you folks. Here are couple of things i have done to make the query run faster - but tough luck. Here are the approachs i took:

First approach: I ran the SQL Query what Antares686 suggested and unfortunately this is taking way longer than my original query.I.E - 2 hours 40 minutes to complete where as my unformated query takes around 1 hour 40 minutes...

Second approach: I saved the query to a sql table and used index tuning advisor to recommend as to what needed to make this query run faster. unfortunately index advisor doesnt recommend anything exciting.

Third approach: I did look at the query execution plan and i dont see full table scan. but what i see, lot's of hash - match inner joins with a Cost of 10%. additionally i see index seek scans as well. but dont know for sure, what hash match are related too?

If you guys could suggest something, that will be wonderful.

Thanks,

Cali

Ian Yates SSCoach Points: 19738 More actions · Answer 7

Did I read correctly when you said isql was taking 1GB of RAM?? That's the client tool you use to run the query, not SQL Server itself.... How much RAM was the sqlservr.exe process consuming?

What indices do you currently have on the tables. Any field that you use for joining should have some sort of an index. If this is your main use for the tables then you could make the indices clustered. If not, then try covering indices...

I haven't looked at the query in depth (others have already complained about the formatting... Crystal Reports makes ugly looking code too - I hate debugging it!) but you could try an indexed view if you're willing to suffer a slight drop in insert/update/delete performance. I could only see inner joins so this would lend itself to an indexed view (or at least trying one). How many rows (roughly) are we talking about in the various tables anyway?

California SSCommitted Points: 1888 More actions · Answer 8

Hello Ian,

Yes the sql query tool takes over 1 GB of ram (precisely it took 1.8 GB to complete) in addition to sqlservr.exe which takes 2.1 GB. The rows returned are close to 42 million.

The dim and tblfactexpense tables has clustered indexes and the application is microsoft olap.

appreciated any input to makes this thing runs fast.

thanks,

cali

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 9

Did you try any of the things I posted?

Also, why are you returning 42 million rows instead of a summary of the rows?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Antares686 SSC Guru Points: 125444 More actions · Answer 10

California (12/2/2007)

First approach: I ran the SQL Query what Antares686 suggested and unfortunately this is taking way longer than my original query.I.E - 2 hours 40 minutes to complete where as my unformated query takes around 1 hour 40 minutes...

Sorry I wasn't looking for a significant increase just to make sure I managed to correctly line up with inner joins for readability. But under the hood they should have had the same execution plan.

If there is that much difference then review the execution plan of each NOTE: you can get the estimated execution plan which will save you waiting for an entire run. If the proposed (see updated below) is different then tell me how please. It could also be yours used the cache more becuase mine was just run once those execution plans in memory can be funny that way. But the real question is how many rows do you need returned? If you are returning 10 Million ros depending on the data the client GUI may be the bottleneck itself becuase of the memory needed to display.

Here is the final corrected version I would use personally as a starting point



SELECT

	dimRptBasis.SEQ1

	, dimDataSrc.SEQ4

	, dimDataSrc.SEQ3

	, dimDataSrc.SEQ2

	, dimDataSrc.SEQ1

	, dimEmployee.SEQ2

	, dimEmployee.SEQ1

	, dimAccount_Expense.SEQ5

	, dimAccount_Expense.SEQ4

	, dimAccount_Expense.SEQ3

	, dimAccount_Expense.SEQ2

	, dimAccount_Expense.SEQ1

	, dimBusOrg.SEQ11

	, dimBusOrg.SEQ10

	, dimBusOrg.SEQ9

	, dimBusOrg.SEQ8

	, dimBusOrg.SEQ7

	, dimBusOrg.SEQ6

	, dimBusOrg.SEQ5

	, dimBusOrg.SEQ4

	, dimBusOrg.SEQ3

	, dimBusOrg.SEQ2

	, dimBusOrg.SEQ1

	, dimBusPartner.SEQ3

	, dimBusPartner.SEQ2

	, dimBusPartner.SEQ1

	, dimCostCenter_Expense.SEQ5

	, dimCostCenter_Expense.SEQ4

	, dimCostCenter_Expense.SEQ3

	, dimCostCenter_Expense.SEQ2

	, dimCostCenter_Expense.SEQ1

	, dimTime.TIMEKEYNAME1

	, dimTime.TIMEKEYNAME2

	, dimTime.TIMEKEYNAME3

	, dimCategory.SEQ1

	, dimProduct.SEQ4

	, dimProduct.SEQ3

	, dimProduct.SEQ2

	, dimProduct.SEQ1

	, dimProject.SEQ2

	, dimProject.SEQ1

	, tblFACTExpense.SIGNEDDATA

FROM

	dbo.tblFactExpense tblFactExpense

INNER JOIN

	dbo.dimRptBasis dimRptBasis

ON

	tblFACTExpense.RptBasis = dimRptBasis.ID1

INNER JOIN

	dbo.dimDataSrc dimDataSrc

ON

	tblFACTExpense.DataSrc = dimDataSrc.ID1

INNER JOIN

	dbo.dimEmployee dimEmployee

ON

	tblFACTExpense.Employee = dimEmployee.ID1

INNER JOIN

	dbo.dimAccount_Expense dimAccount_Expense

ON

	tblFACTExpense.Account_Expense = dimAccount_Expense.ID1

INNER JOIN

	dbo.dimBusOrg dimBusOrg

ON

	tblFACTExpense.BusOrg = dimBusOrg.ID1

INNER JOIN

	dbo.dimBusPartner dimBusPartner

ON

	tblFACTExpense.BusPartner = dimBusPartner.ID1

INNER JOIN

	dbo.dimCostCenter_Expense dimCostCenter_Expense

ON

	tblFACTExpense.CostCenter_Expense = dimCostCenter_Expense.ID1

INNER JOIN

	dbo.dimTime dimTime

ON

	tblFACTExpense.TIMEID = dimTime.TIMEID

INNER JOIN

	dbo.dimCategory dimCategory

ON

	tblFACTExpense.Category = dimCategory.ID1

INNER JOIN

	dbo.dimProduct dimProduct

ON

	tblFACTExpense.Product = dimProduct.ID1

INNER JOIN

	dbo.dimProject dimProject

ON

	tblFACTExpense.Project = dimProject.ID1

PW-201837 SSC-Insane Points: 20805 More actions · Answer 11

PW-201837

SSC-Insane

Points: 20805

December 3, 2007 at 8:30 am

#756508

True or False:

This query was captured as the SQL sent from MS Analysis Services ? As such you have no control over how the SQL is constructed and can't convert old style join syntax to INNER JOIN.

If True:

Where does isql and its CPU/memory utilization factor into your problem ?

Antares686 SSC Guru Points: 125444 More actions · Answer 12

PW (12/3/2007)

True or False:

This query was captured as the SQL sent from MS Analysis Services ? As such you have no control over how the SQL is constructed and can't convert old style join syntax to INNER JOIN.

Never saw anything about Analysis Services in the post just the query he posed which I altered to INNER JOIN version of the same thing.

If True:

Where does isql and its CPU/memory utilization factor into your problem ?

Any method of display records will require memory to manage the dataset, the client cursor (depending on server side or client side will have a different impact), display the values and will require using the OS page file even in cases where it may not use all of the system memory. This along with other processes running will impact overall speed and performance of the data being displayed. Especially if the system has to juggle data in an out of the page file and the system is swamped. I have been on machines with 1 GB of physical ram on the client and when looking at the server it showed the connection sleeping but the client machine was plodding along several mintues after to just display the data.

Have to understand the need for a lot of records in one go before I can determine what courses of action may help. First place is clean up the query and determine what is happening in the execution plan. But this is a straight forward query of joins such that either the joins are slow due to inadequte indexes for the join to manage, the server is mismanaging the query in the execution plan (known issues with parallel processing in SQL 2000 where MAX DOP 0 could have huge performance gains), there is a network bottleneck (data onto, across or off the wire can be the issue), or the client workstation is being swamped by trying to visually manage too much data. For the later I would suggest output to a file them check the file might show the results they are after if the are just trying to testing for an extract, if they need to display anything visually then do they really need all records, or if not then complete out the query and see what you finally get. But overall this one really boils down to indexes, and infrastructure as best I can tell.

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 13

California (12/2/2007)

I have some updates to share with you folks. Here are couple of things i have done to make the query run faster - but tough luck. Here are the approachs i took:

First approach: I ran the SQL Query what Antares686 suggested and unfortunately this is taking way longer than my original query.I.E - 2 hours 40 minutes to complete where as my unformated query takes around 1 hour 40 minutes...

Second approach: I saved the query to a sql table and used index tuning advisor to recommend as to what needed to make this query run faster. unfortunately index advisor doesnt recommend anything exciting.

Third approach: I did look at the query execution plan and i dont see full table scan. but what i see, lot's of hash - match inner joins with a Cost of 10%. additionally i see index seek scans as well. but dont know for sure, what hash match are related too?

If you guys could suggest something, that will be wonderful.

Thanks,

Cali

Hash match usually means that the indexing isn't all that wonderful, so SQL server is building a hash table to match things up. In other words - it's guessing the smaller of two tables and build a unique value list from it to use that to match things up.

Since you're in 2005 - have you tried building covering indexes? One for each table, adding the join criteria being used to the key portion and including the non-join fields (this makes an actual difference in 2005).

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

PW-201837 SSC-Insane Points: 20805 More actions · Answer 14

See yesterday's reply from the OP:

It is "microsoft olap".

I recognize the query (the object name double quoting & old style joins) from having to deal with similar problems in queries originating from cube processing.

I'm questioning isql and memory usage because I think it's a red herring that has us focusing on the wrong issue:

I'm assuming:

Slow query in cube building process was causing concern

The SQL text of the query was captured in profiler & copied to isql for disgnostic purposes

Isql was swamped with large resultset and then clouded the whole issue by causing CPU and memory issues that are due to use of isql.

Need the OP to clarify what's going on and how isql fts into this.