Reducing the hash match inner join cost

Question

Post reply

Reducing the hash match inner join cost

Viewing 3 posts - 16 through 17 (of 17 total)

You must be logged in to reply to this topic. Login to reply

ScottPletcher SSC Guru Points: 100262 More actions · Answer 1

Jeff Moden (7/21/2016)

ScottPletcher (7/21/2016)

Jeff Moden (7/20/2016)

ScottPletcher (7/20/2016)

If the "isactive" flag is significant, i.e. a significant number of rows are actually inactive, then cluster first on it. If not, get rid of it. At the risk of sounding like Celko, they often really are a hold-over from the '60s and '70s when files were always fully read anyway, and thus such flags were "free" to use.

Further advice would require seeing table definitions. For best performance overall, concentrate first and by far foremost on getting the best clustered index on every table.

Gosh... If I understand what you're said, I'd never cluster on a flag column first. I won't even build an NCI with a leading flag column. I learned my lesson on that several years ago when I paralyzed Expedia.com for a good 2 minutes before I realized my mistake and flipped the first two column of the index around. The page splits where rampant and the whole world was timing out because of it.

Actually, I think you did misunderstand somewhat what I said. I never mentioned an NCI. If, and only if (as I stated), a large number of rows are inactive, I would cluster first on the inactive flag, as I stated, to immediately seek to only the rows SQL has to process. Yes, rows would move, but only once, and presumably roughly in ascending order of cluster key column #2 and beyond.

Really my preference is to have a completely separate table for inactive rows, but many people love the idea of a "soft delete" flag in the main table; which really does seem to me to be a holdover from pre-relational days.

Thanks for confirming that I didn't misunderstand.

Clustering (or creating any index as the first column) on a flag (or other extremely low cardinality column) is a really bad idea for anything where the flag can change even once. Like I said, I paralyzed Expedia.com for about two minutes by making such an index.

As a general rule. But, to me, there are exceptions. Such as when: (1) the only filtering is by active flag, and (2) there are a very high number of inactive rows, and (3) all (or nearly all) columns are being SELECTed -- as in this case (with my condition of #2 being required to cause the index change). Then the flag itself is as selective as it gets, and thus I would cluster on it, to seek to only the active rows. The active-row INSERT only happens once, as does the update to inactive, but row gets read 100Ks+ times. I guess we'll disagree on this one.

SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

Sergiy SSC Guru Points: 110192 More actions · Answer 2

rathimittha.mb (7/21/2016)

Thank you Grant. I guess I'l need to figure out a way to get delta(changed) data rather than the entire set

Thanks Again !!

I'd suggest triggers on the tables FOR INSERT,UPDATE,DELETE which would record ID's (only unique ID's, nothing else!) into kind of {TEableName}_ActionLog tables, if those ID's are not there yet (use WHERE NOT EXISTS for that check).

Then, when you copy the data, select only those records which are mentioned in ActionLog tables and take the appropriate action (depending on the presence of records with those ID's in Source only, Target only or both tables do either INSERT or UPDATE or DELETE).

_____________
Code for TallyGenerator

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 3

ScottPletcher (7/21/2016)

But, to me, there are exceptions. Such as when: (1) the only filtering is by active flag, and (2) there are a very high number of inactive rows, and (3) all (or nearly all) columns are being SELECTed -- as in this case (with my condition of #2 being required to cause the index change). Then the flag itself is as selective as it gets, and thus I would cluster on it, to seek to only the active rows. The active-row INSERT only happens once, as does the update to inactive, but row gets read 100Ks+ times. I guess we'll disagree on this one.

Heh... As always, I'll agree that "it depends". 🙂 The bottom line would be that a test needs to be done to ensure rapid fire inserts aren't going to be crushed.

On the other hand, it would be much more fun to have a pork chop dinner with the designer of such a table. 😛

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions