View with many tables and poor performance

Question

Post reply

View with many tables and poor performance

Viewing 13 posts - 16 through 27 (of 27 total)

You must be logged in to reply to this topic. Login to reply

Maxim Picard SSCrazy Points: 2052 More actions · Answer 1

The stats seem up to date and in line with the data distribution you said earlier (10 run per race).

That nested loop appears to be the culprick but it's hard to test without all the data.

Could you post the query plan for this (added a hint for HASH join)



SELECT

	*

FROM

	dbo.MstRace

	INNER JOIN

	dbo.MstRun ON

		dbo.MstRun.RaceID = dbo.MstRace.ID

	LEFT OUTER JOIN

	dbo.MstRace AS MstRaceP1 ON

		MstRaceP1.ID = dbo.MstRun.PrvRace1

	LEFT OUTER HASH JOIN

	dbo.MstRun AS MstRunP1 ON

		MstRunP1.ID = dbo.MstRun.PrvRun1

WHERE

	dbo.MstRace.ID < 10000

dji Ten Centuries Points: 1229 More actions · Answer 2

dji

Ten Centuries

Points: 1229

March 24, 2011 at 2:53 pm

#1303249

Thanks again for your input.

The new execution plan is attached.

Maxim Picard SSCrazy Points: 2052 More actions · Answer 3

Maxim Picard

SSCrazy

Points: 2052

March 24, 2011 at 3:04 pm

#1303252

It's worst :unsure:

I will be offline for a while, I check back later today hopefully with a different angle to look at this problem.

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 4

Sorry, saw this thread earlier but my life is a bit zany at the moment.

Remove the 100 percent part. There's no purpose to it and the old trick of sorting in a view never really worked well anyway.

That is one seriously WIDE result set. Most of this is numeric, I hope?

Here's where we'll need to concentrate (formatted for my convenience):



FROM

	ri_master.dbo.trainer AS trainer_1

	CROSS JOIN

	dbo.MstRaceType2

	JOIN

		dbo.MstCourse

		JOIN

			dbo.MstRaceType1

			JOIN

				dbo.MstRace AS MstRace_1

			ON	dbo.MstRaceType1.ID = MstRace_1.RaceType1

			JOIN

				dbo.MstRider

			ON	MstRace_1.RiderID = dbo.MstRider.ID

			JOIN

				dbo.MstDistance

			ON	MstRace_1.DistanceID = dbo.MstDistance.ID

			JOIN

				dbo.MstAge

			ON	MstRace_1.AgeID = dbo.MstAge.ID

		ON	dbo.MstCourse.ID = MstRace_1.CourseID

	ON	dbo.MstRaceType2.ID = MstRace_1.RaceType2

	RIGHT JOIN

		ri_master.dbo.trainer

		JOIN

			dbo.MstRun AS MstRun_1

		ON	trainer_1.tid = MstRun_1.TrainerID

		RIGHT JOIN

			ri_master.dbo.jockey

			JOIN

				dbo.MstAge AS MstAge_1

				JOIN

					dbo.MstRaceType1 AS MstRaceType1_1

					JOIN

						dbo.MstRace

						JOIN

							dbo.MstRun

						ON	dbo.MstRace.ID = dbo.MstRun.RaceID

						JOIN

							ri_master.dbo.sire

						ON	dbo.MstRun.SireID = ri_master.dbo.sire.shid

						JOIN

							ri_master.dbo.dam

						ON	dbo.MstRun.DamID = ri_master.dbo.dam.dhid

						JOIN

							dbo.MstRider AS MstRider_1

						ON	dbo.MstRace.RiderID = MstRider_1.ID

						JOIN

							dbo.MstCourse AS MstCourse_1

						ON	dbo.MstRace.CourseID = MstCourse_1.ID

					ON	MstRaceType1_1.ID = dbo.MstRace.RaceType1

					JOIN

						dbo.MstGoing

					ON	dbo.MstRace.GoingID = dbo.MstGoing.ID

					JOIN

						dbo.MstRaceType2 AS MstRaceType2_1

					ON	dbo.MstRace.RaceType2 = MstRaceType2_1.ID

					JOIN

						dbo.MstDistance AS MstDistance_1

					ON	dbo.MstRace.DistanceID = MstDistance_1.ID

				ON	MstAge_1.ID = dbo.MstRace.AgeID

				JOIN

					ri_master.dbo.horse

				ON	dbo.MstRun.HorseID = ri_master.dbo.horse.hid

			ON	ri_master.dbo.jockey.jid = dbo.MstRun.JockeyID

			JOIN

				ri_master.dbo.owner

			ON	dbo.MstRun.OwnerID = ri_master.dbo.owner.oid

		ON	MstRun_1.ID = dbo.MstRun.PrvRun1

	ON MstRace_1.ID = dbo.MstRun.PrvRace1

Was there a particular reasoning between the layered joins? For the Right JOIN subjoin, sure, but this is more embedded then I believe is necessary. Doesn't necessarily hurt anything, but it's hard to read.

Taking this and comparing it (with a bit of further reformatting for my preferences 🙂 ) you've got some estimation malfunctions going on here. A good example in the query plan is the hook off the index seek on MSTRun on RaceID <= 10000. For some reason, the optimizer estimated *10* rows. It came back with over 100k. The keylookup afterwards is estimated 1 row, and fires off the same 110k rows.

Add a few of these together and your optimizer is pretty fried. We're going to need to take this apart in inches.

Alright, trainer_1 out of the mix, it doesn't matter, it's a straight cross join, and is 6000 rows. We'll try to clean that up later, for some reason it thinks hash matching a cross join is a good thing but there's a lot in this plan that's gone a little sideways, and I'm not sure I found the right one.

Here's the core, pre outer join query:



select 1

FROM

	ri_master.dbo.trainer AS trainer_1

	CROSS JOIN

------

	dbo.MstRaceType2

	JOIN

		dbo.MstRace AS MstRace_1

	ON	dbo.MstRaceType2.ID = MstRace_1.RaceType2

	JOIN

		dbo.MstCourse

	ON	MstRace_1.CourseID = dbo.MstCourse.ID

	JOIN

		dbo.MstRaceType1

	ON	MstRace_1.RaceType1 = dbo.MstRaceType1.ID

	JOIN

		dbo.MstRider

	ON	MstRace_1.RiderID = dbo.MstRider.ID

	JOIN

		dbo.MstDistance

	ON	MstRace_1.DistanceID = dbo.MstDistance.ID

	JOIN

		dbo.MstAge

	ON	MstRace_1.AgeID = dbo.MstAge.ID

----- Pre-Right Join Connections.

Let's start with the core, what's that .sqlplan look like?

Please note every row in here is repeated 6600 times because of that cross join from ri_master.dbo.trainer. Is this intentional? There are NO limitations placed against that trainer list. There are right join associations later, but it's just repeated data here... and I'm not sure that's your intent. You do it again for every trainer after the right join (it says join but there's no restrictive on clause).

That's 6600 * 6600 rows that don't seem to have a logical purpose. It looks like you actually want to bring that cross join in after the right join, against MSTRun_1, which would make sense there.

At around here:

RIGHT JOIN



		ri_master.dbo.trainer

		JOIN

			dbo.MstRun AS MstRun_1

		ON	trainer_1.tid = MstRun_1.TrainerID

I think that's where you actually want to bring in that first reference to trainer_1, after the right join.

If you can, let me see what the sql plan for that first component looks like (you posted your ddl and indexes earlier so I'll be able to compare to that if necessary), and take a solid look at your trainer logic. If we can clean those two items up I believe we'll clean up most of the problem.

EDIT: I should mention that the SELECT 1 is going to kind of lie to us, because it's going to just use indexing because there's no cause to go to the clustered for covering fields. That's okay. We need to see the core logic in action first before we go anywhere else, like worrying about what columns it wants later.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

dji Ten Centuries Points: 1229 More actions · Answer 5

Craig, many thanks for your input. I must apologise, as I have since tidied up the source of the original view to at least get it working to an extent, the cross join has been removed. The new source was included as an attachment as I was trying to make the post look neater as I realised it did look terrible to read initially. My original view source was generated by the view designer which I have since modified manually as it was doing unexpected things to my view.

Leaving out all the individual columns, the joins look like this at the moment, this is the version Maxim has kindly been looking at;

FROM

dbo.MstRace INNER JOIN

dbo.MstRun ON dbo.MstRace.ID = dbo.MstRun.RaceID LEFT OUTER JOIN

dbo.MstRace AS MstRaceP1 ON dbo.MstRun.PrvRace1 = MstRaceP1.ID LEFT OUTER JOIN

dbo.MstRun AS MstRunP1 ON dbo.MstRun.PrvRun1 = MstRunP1.ID INNER JOIN

dbo.MstAge ON dbo.MstRace.AgeID = dbo.MstAge.ID INNER JOIN

dbo.MstCourse ON dbo.MstRace.CourseID = dbo.MstCourse.ID INNER JOIN

dbo.MstDistance ON dbo.MstRace.DistanceID = dbo.MstDistance.ID INNER JOIN

dbo.MstGoing ON dbo.MstRace.GoingID = dbo.MstGoing.ID INNER JOIN

dbo.MstRider ON dbo.MstRace.RiderID = dbo.MstRider.ID INNER JOIN

dbo.horse ON dbo.MstRun.HorseID = dbo.horse.hid INNER JOIN

dbo.MstRaceType1 ON dbo.MstRace.RaceType1 = dbo.MstRaceType1.ID INNER JOIN

dbo.MstRaceType2 ON dbo.MstRace.RaceType2 = dbo.MstRaceType2.ID INNER JOIN

dbo.dam ON dbo.MstRun.DamID = dbo.dam.dhid INNER JOIN

dbo.jockey ON dbo.MstRun.JockeyID = dbo.jockey.jid INNER JOIN

dbo.owner ON dbo.MstRun.OwnerID = dbo.owner.oid INNER JOIN

dbo.sire ON dbo.MstRun.SireID = dbo.sire.shid INNER JOIN

dbo.trainer ON dbo.MstRun.TrainerID = dbo.trainer.tid LEFT OUTER JOIN

dbo.MstDistance AS MstDistanceP1 ON MstRaceP1.DistanceID = MstDistanceP1.ID LEFT OUTER JOIN

dbo.MstRaceType1 AS MstRaceType1P1 ON MstRaceP1.RaceType1 = MstRaceType1P1.ID LEFT OUTER JOIN

dbo.MstRaceType2 AS MstRaceType2P1 ON MstRaceP1.RaceType2 = MstRaceType2P1.ID LEFT OUTER JOIN

dbo.trainer AS trainerP1 ON MstRunP1.TrainerID = trainerP1.tid LEFT OUTER JOIN

dbo.MstCourse AS MstCourseP1 ON MstRaceP1.CourseID = MstCourseP1.ID LEFT OUTER JOIN

dbo.MstRider AS MstRiderP1 ON MstRiderP1.ID = MstRaceP1.RiderID

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 6

Ah hah! I must have missed that when I re-scanned the thread. That makes more sense as to his earlier observations. I do remember thinking to myself "Are we looking at the same query?!" 😛

I'll be back in a bit on this. My system is currently crying while I add a column to some rediculously large table so I have a bit of time. 🙂

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 7

A couple of first glance notes with the right query.

First, your SQLPlan, which you have to go to the XML to see this, has this problem:

You never finished optimizing. It got too complex.

Next, in your index definitions above, it doesn't describe an Index I see being used on MstRun. IX_RaceIDLenghtsBehindWinner.

I'm assuming you ran this with an extra component for raceid <= 10000 to try to get an actual execution plan. It actually goofed up the query plan a little bit, but I realize the necessity.

Am I correct in assuming you're expecting 1.4 million rows out of this query at the end because MSTRun is actually the core table to this, and that's the rowcount? I don't THINK any of these joined items will duplicate rows, but can you confirm?

Nothing in here looks horribly poor. The scans are all on small tables (two or three internal pages at most, no issue), and you're going to almost always scan because you're pulling *everything*, so index seeking is not really up for discussion here.

Your estimated and actual number of rows get close to each other near the end (89k est, 110k actual), and there might be a slightly better shape to the query if it estimated your <=10000 correctly, but that's because of the parameterization of it and it was probably compiled with a very small number, like 1 or 2.

I see why Maxim was going after join mechanics to try to speed this up, but I'm relatively sure this won't help.

You commented earlier this is primarily a home business. Can I assume you're running this on your desktop PC using standard internal drives to the PC?

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

dji Ten Centuries Points: 1229 More actions · Answer 8

Craig, Thanks for the reply again, I am going to go through the execution plan and try and hook up your points with what is included in the plan. Before today, I had never heard of such a thing, it seems to offer quite an insight into the mechanics of sql?

IX_RaceIDLenghtsBehindWinner is an index used for another query / application, I didn't include in the definitions above as I was trying to keep things clear with regard to this view. There are in fact another five indexes on MstRun not detailed in the indexes.rtf, they are over Rundate, HorseIDCourseid etc. I thoguht they were not relevant to this problem:blush:

The <= 10000 was indeed to get the query to execute, it never finished when left to run over the whole file previously.

Ultimately I was looking to run this view / query overnight each night into a table, and then create indexes over quite a few columns to enable some of my appliucations to do some heavy querying of the resultant table. In a word, yes I wanted the whole 1,400,000+ rows.

My machine is fairly standard dual core 2.8mHz processor, on checking I have 2gb ram. Disks two raided 1tb sata HDD, SQL runs on a 100gb partition of the raid (that's raid that spans two disks, not mirroring). Win XP sp3.

I could easily upgrade to 4gb ram, would this give a performance boost of note?

If as you are thinking there is no real improvement to be made from the SQL, perhaps I will do the query without MstRaceP1 and MstRunP1 into a table, then run some code to go through the table and pickup rows associated with PrvRace1 and PrvRun1 and populate the fields this way?

I hope I haven't wasted anyones time with this, all the help is most appreciated.

jake

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 9

The most important point first:

dji (3/24/2011)

I hope I haven't wasted anyones time with this, all the help is most appreciated.

You haven't Jake. This is an excellent example to learn from. Seemingly complex but actually simple, yet huge, so it shows the problems with large data volume mechanics. That and we're selective volunteers. If we wasted our time, it was our choice. 😀

Craig, Thanks for the reply again, I am going to go through the execution plan and try and hook up your points with what is included in the plan. Before today, I had never heard of such a thing, it seems to offer quite an insight into the mechanics of sql?

In a way, it IS SQL. The SQL engine offers storage, sure. It offers backups of that. It offers some data consistency checks as constraints. Nothing any decent filesystem with a little scripting couldn't do. The biggest part of the engine itself is the optimizer, which builds these execution plans, and all the hooks that allow the optimizer to go faster (partitioning, indexing, join methodology, etc.).

The execution plan is the first point of reference when performance goes out of whack.

IX_RaceIDLenghtsBehindWinner is an index used for another query / application, I didn't include in the definitions above as I was trying to keep things clear with regard to this view. There are in fact another five indexes on MstRun not detailed in the indexes.rtf, they are over Rundate, HorseIDCourseid etc. I thoguht they were not relevant to this problem:blush:

Every index can be relevant, if only for the reason that it's NOT using it.

My machine is fairly standard dual core 2.8mHz processor, on checking I have 2gb ram. Disks two raided 1tb sata HDD, SQL runs on a 100gb partition of the raid (that's raid that spans two disks, not mirroring). Win XP sp3.

I could easily upgrade to 4gb ram, would this give a performance boost of note?

That depends, and I need to go find the correct perfmon counters. The next thing to check is twofold. First is your I/O from drive. The second is your page faults to virtual memory (aka: swapfile or pagefile). The upgrade will only help the second. That you're reading from and writing to the same physical array isn't helping. I believe you're actually hitting hardware limitations, not query limitations. When you want everything, you get everything... and there's not a lot of optimization that can happen when you say "Gimme all of it". It's doing the best it reasonably can from what I'm seeing.

If as you are thinking there is no real improvement to be made from the SQL, perhaps I will do the query without MstRaceP1 and MstRunP1 into a table, then run some code to go through the table and pickup rows associated with PrvRace1 and PrvRun1 and populate the fields this way?

You're reading my mind. If we can't deal with the length, let's try going after the width. I'd start with building out the 1.4 mill rows from the MstRun table first, perhaps moving over all associated data. Next, move over a few more tables worth of data using updates. Rinse, repeat.

It will take longer then an all at once approach if the hardware could keep up, but if it never completes, anything is faster then infinity.

Also, make sure your target database is grown out to a reasonable size before you start this. If you're constantly filegrowthing in 1 megabyte increments, that's certainly not helping.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

dji Ten Centuries Points: 1229 More actions · Answer 10

Thanks for all the help on this problem to everyone who has posted. It has been a massive learning curve, and a very useful one.

One idea I have thought about for a while to speed things up is a solid state drive for my SQL data directory. The last reviews I read had some problems with the technology, perhaps I need to look at this again. I could spend money on a new machine entirely as it is critical for me to keep upping demands.

I should have added the large drive containing my SQL db is not used for any other i/o of note. The large chunk of space left over from the 2tb riad is used for backup of racing videos overnight which only takes 10 minutes max. So the drive is basically used only for SQL, attached direcetly to a built in riad controller on the mb.

jake

Maxim Picard SSCrazy Points: 2052 More actions · Answer 11

Hi again Jake

I think it would be a great idea to divide the process into two parts. First populate a table with all the information from the runs and after that add the information regarding the previous run.

I'd be happy to look at this again after you have the first part of the query materialized if you'd like.

It's great to see what problems others are facing, especially when they approach it like you do.

Maxim

dji Ten Centuries Points: 1229 More actions · Answer 12

For anyone remotely interested!!

I bought a new laptop recently, intel i7 quadcore, 4gb ram. Loaded SQL 2008 R2 attached my db featured in this thread and ran this query.

Return 1,500,000 records in 10 mins 56 seconds. A remarkable improvement, presumably the combination of bigger processing power and sql 2008 r2 have made this quantum leap in performance possible.

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 13

dji (6/6/2011)

For anyone remotely interested!!

I bought a new laptop recently, intel i7 quadcore, 4gb ram. Loaded SQL 2008 R2 attached my db featured in this thread and ran this query.

Return 1,500,000 records in 10 mins 56 seconds. A remarkable improvement, presumably the combination of bigger processing power and sql 2008 r2 have made this quantum leap in performance possible.

Indeed it is true that sometimes better hardware IS the simplest and most cost-effective solution! 😎

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service