View with many tables and poor performance

  • The stats seem up to date and in line with the data distribution you said earlier (10 run per race).

    That nested loop appears to be the culprick but it's hard to test without all the data.

    Could you post the query plan for this (added a hint for HASH join)

    SELECT

    *

    FROM

    dbo.MstRace

    INNER JOIN

    dbo.MstRun ON

    dbo.MstRun.RaceID = dbo.MstRace.ID

    LEFT OUTER JOIN

    dbo.MstRace AS MstRaceP1 ON

    MstRaceP1.ID = dbo.MstRun.PrvRace1

    LEFT OUTER HASH JOIN

    dbo.MstRun AS MstRunP1 ON

    MstRunP1.ID = dbo.MstRun.PrvRun1

    WHERE

    dbo.MstRace.ID < 10000

  • Thanks again for your input.

    The new execution plan is attached.

  • It's worst :unsure:

    I will be offline for a while, I check back later today hopefully with a different angle to look at this problem.

  • Sorry, saw this thread earlier but my life is a bit zany at the moment.

    Remove the 100 percent part. There's no purpose to it and the old trick of sorting in a view never really worked well anyway.

    That is one seriously WIDE result set. Most of this is numeric, I hope?

    Here's where we'll need to concentrate (formatted for my convenience):

    FROM

    ri_master.dbo.trainer AS trainer_1

    CROSS JOIN

    dbo.MstRaceType2

    JOIN

    dbo.MstCourse

    JOIN

    dbo.MstRaceType1

    JOIN

    dbo.MstRace AS MstRace_1

    ON dbo.MstRaceType1.ID = MstRace_1.RaceType1

    JOIN

    dbo.MstRider

    ON MstRace_1.RiderID = dbo.MstRider.ID

    JOIN

    dbo.MstDistance

    ON MstRace_1.DistanceID = dbo.MstDistance.ID

    JOIN

    dbo.MstAge

    ON MstRace_1.AgeID = dbo.MstAge.ID

    ON dbo.MstCourse.ID = MstRace_1.CourseID

    ON dbo.MstRaceType2.ID = MstRace_1.RaceType2

    RIGHT JOIN

    ri_master.dbo.trainer

    JOIN

    dbo.MstRun AS MstRun_1

    ON trainer_1.tid = MstRun_1.TrainerID

    RIGHT JOIN

    ri_master.dbo.jockey

    JOIN

    dbo.MstAge AS MstAge_1

    JOIN

    dbo.MstRaceType1 AS MstRaceType1_1

    JOIN

    dbo.MstRace

    JOIN

    dbo.MstRun

    ON dbo.MstRace.ID = dbo.MstRun.RaceID

    JOIN

    ri_master.dbo.sire

    ON dbo.MstRun.SireID = ri_master.dbo.sire.shid

    JOIN

    ri_master.dbo.dam

    ON dbo.MstRun.DamID = ri_master.dbo.dam.dhid

    JOIN

    dbo.MstRider AS MstRider_1

    ON dbo.MstRace.RiderID = MstRider_1.ID

    JOIN

    dbo.MstCourse AS MstCourse_1

    ON dbo.MstRace.CourseID = MstCourse_1.ID

    ON MstRaceType1_1.ID = dbo.MstRace.RaceType1

    JOIN

    dbo.MstGoing

    ON dbo.MstRace.GoingID = dbo.MstGoing.ID

    JOIN

    dbo.MstRaceType2 AS MstRaceType2_1

    ON dbo.MstRace.RaceType2 = MstRaceType2_1.ID

    JOIN

    dbo.MstDistance AS MstDistance_1

    ON dbo.MstRace.DistanceID = MstDistance_1.ID

    ON MstAge_1.ID = dbo.MstRace.AgeID

    JOIN

    ri_master.dbo.horse

    ON dbo.MstRun.HorseID = ri_master.dbo.horse.hid

    ON ri_master.dbo.jockey.jid = dbo.MstRun.JockeyID

    JOIN

    ri_master.dbo.owner

    ON dbo.MstRun.OwnerID = ri_master.dbo.owner.oid

    ON MstRun_1.ID = dbo.MstRun.PrvRun1

    ON MstRace_1.ID = dbo.MstRun.PrvRace1

    Was there a particular reasoning between the layered joins? For the Right JOIN subjoin, sure, but this is more embedded then I believe is necessary. Doesn't necessarily hurt anything, but it's hard to read.

    Taking this and comparing it (with a bit of further reformatting for my preferences 🙂 ) you've got some estimation malfunctions going on here. A good example in the query plan is the hook off the index seek on MSTRun on RaceID <= 10000. For some reason, the optimizer estimated *10* rows. It came back with over 100k. The keylookup afterwards is estimated 1 row, and fires off the same 110k rows.

    Add a few of these together and your optimizer is pretty fried. We're going to need to take this apart in inches.

    Alright, trainer_1 out of the mix, it doesn't matter, it's a straight cross join, and is 6000 rows. We'll try to clean that up later, for some reason it thinks hash matching a cross join is a good thing but there's a lot in this plan that's gone a little sideways, and I'm not sure I found the right one.

    Here's the core, pre outer join query:

    select 1

    FROM

    ri_master.dbo.trainer AS trainer_1

    CROSS JOIN

    ------

    dbo.MstRaceType2

    JOIN

    dbo.MstRace AS MstRace_1

    ON dbo.MstRaceType2.ID = MstRace_1.RaceType2

    JOIN

    dbo.MstCourse

    ON MstRace_1.CourseID = dbo.MstCourse.ID

    JOIN

    dbo.MstRaceType1

    ON MstRace_1.RaceType1 = dbo.MstRaceType1.ID

    JOIN

    dbo.MstRider

    ON MstRace_1.RiderID = dbo.MstRider.ID

    JOIN

    dbo.MstDistance

    ON MstRace_1.DistanceID = dbo.MstDistance.ID

    JOIN

    dbo.MstAge

    ON MstRace_1.AgeID = dbo.MstAge.ID

    ----- Pre-Right Join Connections.

    Let's start with the core, what's that .sqlplan look like?

    Please note every row in here is repeated 6600 times because of that cross join from ri_master.dbo.trainer. Is this intentional? There are NO limitations placed against that trainer list. There are right join associations later, but it's just repeated data here... and I'm not sure that's your intent. You do it again for every trainer after the right join (it says join but there's no restrictive on clause).

    That's 6600 * 6600 rows that don't seem to have a logical purpose. It looks like you actually want to bring that cross join in after the right join, against MSTRun_1, which would make sense there.

    At around here:

    RIGHT JOIN

    ri_master.dbo.trainer

    JOIN

    dbo.MstRun AS MstRun_1

    ON trainer_1.tid = MstRun_1.TrainerID

    I think that's where you actually want to bring in that first reference to trainer_1, after the right join.

    If you can, let me see what the sql plan for that first component looks like (you posted your ddl and indexes earlier so I'll be able to compare to that if necessary), and take a solid look at your trainer logic. If we can clean those two items up I believe we'll clean up most of the problem.

    EDIT: I should mention that the SELECT 1 is going to kind of lie to us, because it's going to just use indexing because there's no cause to go to the clustered for covering fields. That's okay. We need to see the core logic in action first before we go anywhere else, like worrying about what columns it wants later.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Craig, many thanks for your input. I must apologise, as I have since tidied up the source of the original view to at least get it working to an extent, the cross join has been removed. The new source was included as an attachment as I was trying to make the post look neater as I realised it did look terrible to read initially. My original view source was generated by the view designer which I have since modified manually as it was doing unexpected things to my view.

    Leaving out all the individual columns, the joins look like this at the moment, this is the version Maxim has kindly been looking at;

    FROM

    dbo.MstRace INNER JOIN

    dbo.MstRun ON dbo.MstRace.ID = dbo.MstRun.RaceID LEFT OUTER JOIN

    dbo.MstRace AS MstRaceP1 ON dbo.MstRun.PrvRace1 = MstRaceP1.ID LEFT OUTER JOIN

    dbo.MstRun AS MstRunP1 ON dbo.MstRun.PrvRun1 = MstRunP1.ID INNER JOIN

    dbo.MstAge ON dbo.MstRace.AgeID = dbo.MstAge.ID INNER JOIN

    dbo.MstCourse ON dbo.MstRace.CourseID = dbo.MstCourse.ID INNER JOIN

    dbo.MstDistance ON dbo.MstRace.DistanceID = dbo.MstDistance.ID INNER JOIN

    dbo.MstGoing ON dbo.MstRace.GoingID = dbo.MstGoing.ID INNER JOIN

    dbo.MstRider ON dbo.MstRace.RiderID = dbo.MstRider.ID INNER JOIN

    dbo.horse ON dbo.MstRun.HorseID = dbo.horse.hid INNER JOIN

    dbo.MstRaceType1 ON dbo.MstRace.RaceType1 = dbo.MstRaceType1.ID INNER JOIN

    dbo.MstRaceType2 ON dbo.MstRace.RaceType2 = dbo.MstRaceType2.ID INNER JOIN

    dbo.dam ON dbo.MstRun.DamID = dbo.dam.dhid INNER JOIN

    dbo.jockey ON dbo.MstRun.JockeyID = dbo.jockey.jid INNER JOIN

    dbo.owner ON dbo.MstRun.OwnerID = dbo.owner.oid INNER JOIN

    dbo.sire ON dbo.MstRun.SireID = dbo.sire.shid INNER JOIN

    dbo.trainer ON dbo.MstRun.TrainerID = dbo.trainer.tid LEFT OUTER JOIN

    dbo.MstDistance AS MstDistanceP1 ON MstRaceP1.DistanceID = MstDistanceP1.ID LEFT OUTER JOIN

    dbo.MstRaceType1 AS MstRaceType1P1 ON MstRaceP1.RaceType1 = MstRaceType1P1.ID LEFT OUTER JOIN

    dbo.MstRaceType2 AS MstRaceType2P1 ON MstRaceP1.RaceType2 = MstRaceType2P1.ID LEFT OUTER JOIN

    dbo.trainer AS trainerP1 ON MstRunP1.TrainerID = trainerP1.tid LEFT OUTER JOIN

    dbo.MstCourse AS MstCourseP1 ON MstRaceP1.CourseID = MstCourseP1.ID LEFT OUTER JOIN

    dbo.MstRider AS MstRiderP1 ON MstRiderP1.ID = MstRaceP1.RiderID

  • Ah hah! I must have missed that when I re-scanned the thread. That makes more sense as to his earlier observations. I do remember thinking to myself "Are we looking at the same query?!" 😛

    I'll be back in a bit on this. My system is currently crying while I add a column to some rediculously large table so I have a bit of time. 🙂


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • A couple of first glance notes with the right query.

    First, your SQLPlan, which you have to go to the XML to see this, has this problem:

    <StmtSimple StatementCompId="1" StatementEstRows="89912" StatementId="1" StatementOptmLevel="FULL" StatementOptmEarlyAbortReason="TimeOut" StatementSubTreeCost="348.373" StatementText="SELECT * FROM [vw_raceruntrainer002] WHERE [raceid]<=@1" StatementType="SELECT">

    You never finished optimizing. It got too complex.

    Next, in your index definitions above, it doesn't describe an Index I see being used on MstRun. IX_RaceIDLenghtsBehindWinner.

    I'm assuming you ran this with an extra component for raceid <= 10000 to try to get an actual execution plan. It actually goofed up the query plan a little bit, but I realize the necessity.

    Am I correct in assuming you're expecting 1.4 million rows out of this query at the end because MSTRun is actually the core table to this, and that's the rowcount? I don't THINK any of these joined items will duplicate rows, but can you confirm?

    Nothing in here looks horribly poor. The scans are all on small tables (two or three internal pages at most, no issue), and you're going to almost always scan because you're pulling *everything*, so index seeking is not really up for discussion here.

    Your estimated and actual number of rows get close to each other near the end (89k est, 110k actual), and there might be a slightly better shape to the query if it estimated your <=10000 correctly, but that's because of the parameterization of it and it was probably compiled with a very small number, like 1 or 2.

    I see why Maxim was going after join mechanics to try to speed this up, but I'm relatively sure this won't help.

    You commented earlier this is primarily a home business. Can I assume you're running this on your desktop PC using standard internal drives to the PC?


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Craig, Thanks for the reply again, I am going to go through the execution plan and try and hook up your points with what is included in the plan. Before today, I had never heard of such a thing, it seems to offer quite an insight into the mechanics of sql?

    IX_RaceIDLenghtsBehindWinner is an index used for another query / application, I didn't include in the definitions above as I was trying to keep things clear with regard to this view. There are in fact another five indexes on MstRun not detailed in the indexes.rtf, they are over Rundate, HorseIDCourseid etc. I thoguht they were not relevant to this problem:blush:

    The <= 10000 was indeed to get the query to execute, it never finished when left to run over the whole file previously.

    Ultimately I was looking to run this view / query overnight each night into a table, and then create indexes over quite a few columns to enable some of my appliucations to do some heavy querying of the resultant table. In a word, yes I wanted the whole 1,400,000+ rows.

    My machine is fairly standard dual core 2.8mHz processor, on checking I have 2gb ram. Disks two raided 1tb sata HDD, SQL runs on a 100gb partition of the raid (that's raid that spans two disks, not mirroring). Win XP sp3.

    I could easily upgrade to 4gb ram, would this give a performance boost of note?

    If as you are thinking there is no real improvement to be made from the SQL, perhaps I will do the query without MstRaceP1 and MstRunP1 into a table, then run some code to go through the table and pickup rows associated with PrvRace1 and PrvRun1 and populate the fields this way?

    I hope I haven't wasted anyones time with this, all the help is most appreciated.

    jake

  • The most important point first:

    dji (3/24/2011)


    I hope I haven't wasted anyones time with this, all the help is most appreciated.

    You haven't Jake. This is an excellent example to learn from. Seemingly complex but actually simple, yet huge, so it shows the problems with large data volume mechanics. That and we're selective volunteers. If we wasted our time, it was our choice. 😀

    Craig, Thanks for the reply again, I am going to go through the execution plan and try and hook up your points with what is included in the plan. Before today, I had never heard of such a thing, it seems to offer quite an insight into the mechanics of sql?

    In a way, it IS SQL. The SQL engine offers storage, sure. It offers backups of that. It offers some data consistency checks as constraints. Nothing any decent filesystem with a little scripting couldn't do. The biggest part of the engine itself is the optimizer, which builds these execution plans, and all the hooks that allow the optimizer to go faster (partitioning, indexing, join methodology, etc.).

    The execution plan is the first point of reference when performance goes out of whack.

    IX_RaceIDLenghtsBehindWinner is an index used for another query / application, I didn't include in the definitions above as I was trying to keep things clear with regard to this view. There are in fact another five indexes on MstRun not detailed in the indexes.rtf, they are over Rundate, HorseIDCourseid etc. I thoguht they were not relevant to this problem:blush:

    Every index can be relevant, if only for the reason that it's NOT using it.

    My machine is fairly standard dual core 2.8mHz processor, on checking I have 2gb ram. Disks two raided 1tb sata HDD, SQL runs on a 100gb partition of the raid (that's raid that spans two disks, not mirroring). Win XP sp3.

    I could easily upgrade to 4gb ram, would this give a performance boost of note?

    That depends, and I need to go find the correct perfmon counters. The next thing to check is twofold. First is your I/O from drive. The second is your page faults to virtual memory (aka: swapfile or pagefile). The upgrade will only help the second. That you're reading from and writing to the same physical array isn't helping. I believe you're actually hitting hardware limitations, not query limitations. When you want everything, you get everything... and there's not a lot of optimization that can happen when you say "Gimme all of it". It's doing the best it reasonably can from what I'm seeing.

    If as you are thinking there is no real improvement to be made from the SQL, perhaps I will do the query without MstRaceP1 and MstRunP1 into a table, then run some code to go through the table and pickup rows associated with PrvRace1 and PrvRun1 and populate the fields this way?

    You're reading my mind. If we can't deal with the length, let's try going after the width. I'd start with building out the 1.4 mill rows from the MstRun table first, perhaps moving over all associated data. Next, move over a few more tables worth of data using updates. Rinse, repeat.

    It will take longer then an all at once approach if the hardware could keep up, but if it never completes, anything is faster then infinity.

    Also, make sure your target database is grown out to a reasonable size before you start this. If you're constantly filegrowthing in 1 megabyte increments, that's certainly not helping.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Thanks for all the help on this problem to everyone who has posted. It has been a massive learning curve, and a very useful one.

    One idea I have thought about for a while to speed things up is a solid state drive for my SQL data directory. The last reviews I read had some problems with the technology, perhaps I need to look at this again. I could spend money on a new machine entirely as it is critical for me to keep upping demands.

    I should have added the large drive containing my SQL db is not used for any other i/o of note. The large chunk of space left over from the 2tb riad is used for backup of racing videos overnight which only takes 10 minutes max. So the drive is basically used only for SQL, attached direcetly to a built in riad controller on the mb.

    jake

  • Hi again Jake

    I think it would be a great idea to divide the process into two parts. First populate a table with all the information from the runs and after that add the information regarding the previous run.

    I'd be happy to look at this again after you have the first part of the query materialized if you'd like.

    It's great to see what problems others are facing, especially when they approach it like you do.

    Maxim

  • For anyone remotely interested!!

    I bought a new laptop recently, intel i7 quadcore, 4gb ram. Loaded SQL 2008 R2 attached my db featured in this thread and ran this query.

    Return 1,500,000 records in 10 mins 56 seconds. A remarkable improvement, presumably the combination of bigger processing power and sql 2008 r2 have made this quantum leap in performance possible.

  • dji (6/6/2011)


    For anyone remotely interested!!

    I bought a new laptop recently, intel i7 quadcore, 4gb ram. Loaded SQL 2008 R2 attached my db featured in this thread and ran this query.

    Return 1,500,000 records in 10 mins 56 seconds. A remarkable improvement, presumably the combination of bigger processing power and sql 2008 r2 have made this quantum leap in performance possible.

    Indeed it is true that sometimes better hardware IS the simplest and most cost-effective solution! 😎

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

Viewing 13 posts - 16 through 27 (of 27 total)

You must be logged in to reply to this topic. Login to reply