Performance tuning

Question

Post reply

Performance tuning

James Rimell

Right there with Babe

Points: 791
More actions
May 8, 2007 at 7:29 am

#123696

Other than moving this to a faster IO, or partioning it, or moving indexes to different spindle, anyone see a way to get a performance increase out of the following T-SQL? Primary clustered indexes on the WeblogID column and on the urlStemID, as well as a non-clustered one on the logWritten column. N.B. performance almost exactly the same to the second when using an inner join rather than a 'in' . Currently the DB size is 3.5GB - and about 13 million records in the weblog table - SQL server express (yes I know going to have to partition soon or upgrade since I will soon hit the 4GB limit but this is non production at the moment). According to the exceution plan the bulk of the cost is in the clustered index scan of the weblog table - and of that 85% is the IO. I guess I know the answer but was just wondering on any others thoughts to get this below the 22s it is taking to run for long date ranges i.e >6 months:

SELECT

Year(logWritten) as [Year], MONTH(logWritten) as [Month],Day(logWritten) as [Day of Month],count(WebLogID) as 'Visits' FROM Weblog

WHERE (urlStemID IN

(SELECT urlStemID FROM URLStem

WHERE (StemText LIKE @strSearch)))

AND (logWritten BETWEEN @statDate AND @endDate)

GROUP BY Year(logWritten),MONTH(logWritten),Day(logWritten)

ORDER BY Year(logWritten),MONTH(logWritten),Day(logWritten)

Viewing 15 posts - 1 through 15 (of 23 total)

You must be logged in to reply to this topic. Login to reply

Sergiy SSC Guru Points: 110192 More actions · Answer 1

Sergiy

SSC Guru

Points: 110192

May 8, 2007 at 7:52 am

#704917

Clustered index must be on logWritten column.

And use join anyway.

_____________
Code for TallyGenerator

Jules Bonnot SSCertifiable Points: 5488 More actions · Answer 2

SELECT Year(logWritten) as [Year], MONTH(logWritten) as [Month],Day(logWritten) as [Day of Month],

count(WebLogID) as 'Visits'

FROM Weblog wl

join URLStem us

on wl.urlStemID = us.urlStemID

where StemText LIKE @strSearch

and logWritten BETWEEN @statDate AND @endDate

GROUP BY Year(logWritten),MONTH(logWritten),Day(logWritten)

ORDER BY Year(logWritten),MONTH(logWritten),Day(logWritten)

that should help a bit but hereare other things you could do.

Does log written have a time portion?

IS there a wild card at the of @strSearch, is it needed?

www.sql-library.com[/url]

James Rimell Right there with Babe Points: 791 More actions · Answer 3

Nope that is is exactly the same performance (as stated in in my original post - had already tried this with joins) although I did run your version just in case!. Yes logwritten is a time stamp of type datetime, unfortuantly @strSearch is a variable string which changes depending upon what the user is looking for stats on. The clustered scan on the urlstem table looking for the string has a 0% cost though - it is the clustered scan of the weblog table ( cost:71%) and the hash match:- right semi join from my code, or inner join from your, (cost:26%) is where the delays are. Thanks for the reply

James Rimell Right there with Babe Points: 791 More actions · Answer 4

James Rimell

Right there with Babe

Points: 791

May 8, 2007 at 8:07 am

#704929

Indexes being rebuilt, with clustered on the time - will have performance info in a bit

Sergiy SSC Guru Points: 110192 More actions · Answer 5

Sergiy

SSC Guru

Points: 110192

May 8, 2007 at 4:31 pm

#705103

Still being rebuilt?

_____________
Code for TallyGenerator

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 6

You'll never do better than an INDEX SCAN with all the queries written so far... not sure there's one better, either. Might be able to tweek a couple of things but I need to know what kind of values can be in @strSearch and StemText...

Also, please tell me that everything you have labeled as a date is a datetime datatype and everything you have labeled as an ID in an INT or BIGINT.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Sergiy SSC Guru Points: 110192 More actions · Answer 7

Sergiy

SSC Guru

Points: 110192

May 8, 2007 at 6:02 pm

#705110

INDEX SCAN will be used every time you query for a range of values from a column not having clustered index on it.

BETWEEN means CLUSTERED INDEX.

End of story.

_____________
Code for TallyGenerator

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 8

Nah... you can get INDEX SEEKS out of non-clustered indexes... they just won't be CLUSTERED INDEX SEEKS. And, (as you know, ol' friend), I don't think they'll come close to an INDEX SEEK even with the correct index on the code, so far.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Antares686 SSC Guru Points: 125444 More actions · Answer 9

If you just need to show on a day by day basis and do not need to break down into Year, Month, Day then this might shave a bit off, but not sure how much with 13 mil rows, only had 10000 available to test with. Also you'll note I wnet back to an inner join but I moved the variable for the URLStem StemText into the join out from the where. Sometimes this can impact the queries performance on multi join queries but might have some impact here.

SELECT

DATEADD(d,DATEDIFF(d,0,WB.logWritten),0) as [Date],

COUNT(WebLogID) as [Visits]

FROM

dbo.Weblog WB

INNER JOIN

dbo.URLStem URL

ON

WB.urlStemID = URL.urlStemID AND

URL.StemText LIKE @strSearch

WHERE

WB.logWritten BETWEEN @statDate AND @endDate

GROUP BY

DATEADD(d,DATEDIFF(d,0,WB.logWritten),0)

ORDER BY

DATEADD(d,DATEDIFF(d,0,WB.logWritten),0)

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 10

1) IN and JOIN run in same time because the optimizer is most likely rewriting the IN as a JOIN under the covers (show estimated query plan for both to verify this).

2) You are actually doing THREE things that make a clustered index on logwritten better for this particular query: BETWEEN, GROUP BY and ORDER BY. If this is the main action performed against this table then even paying the overhead of having a uniquefier added onto the 8 bytes of the date column will be worth it (thus having a 12 byte clustering key size which is carried over to all non-clustered indexes as well). Test and see since it is a non-production environment. BTW, be prepared for a HUGE transaction log when you shift the clustered index. I would manually expand the tlog to 1.5 times the size of the table, then shrink it back down when indexing is complete.

3) Did you remember to set the initial size of the database and tlog files? Did you set the growth rates appropriately? I would check for OS level file framentation if you didn't. Also make sure your antivirus is set to ignore .mdf and .ldf files.

4) You must accept that at some point you are just asking for more data than your spindle/cpu can chew threw in a few seconds. 🙂

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 11

1) IN and JOIN run in same time because the optimizer is most likely rewriting the IN as a JOIN under the covers (show estimated query plan for both to verify this).

2) You are actually doing THREE things that make a clustered index on logwritten better for this particular query: BETWEEN, GROUP BY and ORDER BY. If this is the main action performed against this table then even paying the overhead of having a uniquefier added onto the 8 bytes of the date column will be worth it (thus having a 12 byte clustering key size which is carried over to all non-clustered indexes as well). Test and see since it is a non-production environment. BTW, be prepared for a HUGE transaction log when you shift the clustered index. I would manually expand the tlog to 1.5 times the size of the table, then shrink it back down when indexing is complete.

3) Did you remember to set the initial size of the database and tlog files? Did you set the growth rates appropriately? I would check for OS level file framentation if you didn't. Also make sure your antivirus is set to ignore .mdf and .ldf files.

4) You must accept that at some point you are just asking for more data than your spindle/cpu can chew threw in a few seconds. 🙂

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

James Rimell Right there with Babe Points: 791 More actions · Answer 12

Will get around to checking a few of these ideas later in the week - mainly becuase the request for a clustered index on the date field but the DB over the 4GB limit for SQL express and hence it failed - have to move it to a server with a proper license! yes the date ranges have to be fully adjustable and broken down (when seeking over a couple of days the perforance is OK) - file growth etc was alll sorted from outset, and looked at the OS level defrag - hardly anything - virus checker ignoring it. the string search is in the form '%/podcast/%.mp3' or many other combinations depending upon the report. this side of things has next to 0 cost according the execution plan (the url strings are fully normalised so nothing like the 13m records of the stems). I think the clustered index on the data range is the answer but will have to wait a few days. thanks all for the advice.