Intersect, Except, Union, All and Any

  • Sorry, it was a bit drastic to say it always gives better performance, I should have said normally gives same performance or better. In the example case of the article I got a 75 % reduction in query time. In my experience the EXISTS scales better than NOT IN when there are many rows in the NOT IN / EXISTS table. For production environments EXISTS is a safer solution than NOT IN. In my opinion. 🙂

    Johan

  • Dear David,

    Thanks for detailed explanation – I use your article as reference for all these topics. Unfortunately unlike your conclusion for EXPECT case I found situation where NOT IN gives dramatically worst performance:

    I try to compare local and remote table. For both cases EXCEPT and LEFT JOIN it gives roughly the same performance. See here:

    [EXCEPT]

    SELECT ac.Article_Code

    FROM relAC ac

    EXCEPT

    SELECT Code

    FROM [SRV-DCI-PRAHA].[DCI].[dciowner].Products

    [font="Courier New"]

    Local table: Scanning a clustered index, entirely or only a range.

    Remote Table: Send a SQL query to another than the current SQL Server.

    Remote Table: Compute new values from existing values in a row.

    Both: Match rows from two suitably sorted input tables exploiting their sort order.[/font]

    Cost = 48,5442

    SELECT ac.Article_Code

    FROM relAC AS ac

    LEFT JOIN [SRV-DCI-PRAHA].[DCI].[dciowner].Products AS pr

    ON ac.Article_Code = pr.Code

    WHERE pr.Code IS NULL

    [font="Courier New"]

    Local table: Scanning a clustered index, entirely or only a range.

    Remote Table: Send a SQL query to another than the current SQL Server.

    Remote Table: Compute new values from existing values in a row.

    Both: Match rows from two suitably sorted input tables exploiting their sort order

    Both: Restricting the set of rows based on a predicate.[/font]

    Cost = 48,6391

    [NOT IN]

    Unfortunately using of NOT IN generates local temporary table for better performance for rewinds and generate Nested Loops for each row of local table for comparison of each row in temporary table. These loops dramatically impair performance of actual query. See Execution Plan here:

    SELECT ac.Article_Code

    FROM relAC ac

    WHERE ac.Article_Code NOT IN (

    SELECT Code

    FROM [SRV-DCI-PRAHA].[DCI].[dciowner].Products

    [font="Courier New"]

    Local table: Scanning a clustered index, entirely or only a range.

    Remote Table: Send a SQL query to another than the current SQL Server.

    Remote Table: Compute new values from existing values in a row.[/font]

    [font="Courier New"]Remote Table: Stores the data from the input into a temporary table in order to optimize rewinds.

    Both: For each row in the top (outer) input, scan the bottom (inner) input, and output matching rows.[/font]

    Cost = 77826,3!!

    Petr

    __________________________________
    Standing on the shoulders of giants ...

  • Thanks for the feedback. I haven't tried it with a linked server query.

    Where possible I base my articles and experiments on the sample databases Adventureworks, Northwind or Pubs simply because they provide a consistent base for people to peer review what I am doing.

    One of the most important scientific practises is to put up, not only the results but also the experiment itself up for review. It is quite easy to devise a flawed experiment and put too much store in a set of results produced by that experiment.

    What I attempt to do with my articles is put up a good solid starting point for a discussion. I've learnt some valuable lessons from the feedback given which means that (hopefully) I'm getting as much out of the articles as I hope I am giving.

  • How do we calculate the overall query cost? i.e .05... will it be shown in the Exec plan itself

  • Recently I ran into the difference between UNION and UNION ALL. Pity you don't mention that.

    I encountered that a query result of (say) 300 records UNION'd with a query result of zero records could end up in an overall result set of less than 300 records!

    After some googling and stuff I found out that the function UNION in fact also performs a GROUP BY on all fields. In other words: There will no longer be duplicate records in the end result set. If the duplicates are the result of the UNION or not doesn't matter.

    A query result of (say) 300 records UNION ALL'd with a query result of zero records gets you a result set of exactly (say) 300 records!

  • Note: I am an old curmudgeon who speaks bluntly. Nothing that follows is a personal attack. I just see this as a teachable moment.

    So:

    Let me get this straight:

    You stumbled over some new features.

    You glanced at books online

    You put together some trivial tests against a trivial database

    You decide you are now enough of an expert to write an article about it

    You write an article that spends more time comparing the performance of trivial tests to six decimal places than into what they actually do.

    If you had spent even a few minutes on Google or Wikipedia you would have learned a lot and been in a position to write a useful article.

    Instead you went on the assumption that if you could find good uses for the features then there were no good uses for them. I will let you supply the adjective for that attitude.

    To everyone who thought it was a great article:

    It was a good topic to bring up, but did you read the other comments before you added yours?

  • Let me get this straight:

    You stumbled over some new features.

    CORRECT. In the Microsoft course material

    You glanced at books online

    Did a bit more than glance and also read what little was available in 2009. Not sure when the Wikipedia article originated as it doesn't have an original date, only a modified date in 2011.

    I read BOL and the Microsoft course notes plus the standard Googling. What was available looked very like the "Other editions SQL2005" version on BOL.

    You put together some trivial tests against a trivial database

    CORRECT. I did say it was Adventureworks. The whole point of using Adventureworks is that is a publicly available database. Any tests or experiments run on Adventureworks can be easily repeated and therefore verified by developers and DBAs of all levels of ability.

    You decide you are now enough of an expert to write an article about it

    No, I was asked by the people being put through the course what these operators were and as I hadn't seen them before I did some digging and basic experimenting.

    As with any experiment

    1. Document the experiment so it is repeatable and up for critique.

    2. Document the results so they are verifiable

    I don't have the attitude of expecting people to bask in my magnificence. I'm not that arrogant.

    What I want to achieve in any article

    1. Pass on what I have learnt to those coming up the ranks being very clear about my methods and experiments

    2. Provoke response from the many ahead of me in the ranks so that I learn more.

    Actually the forum responses did teach me something

    select fieldlist from OriginalSource query Where qualifying criteria is met

    intersect

    select fieldlist from Targettable

    Very useful in data warehouse reconcilliation.

    I was arrogant enough to think that I could add some clarity to BOL.

    Instead you went on the assumption that if you could find good uses for the features then there were no good uses for them.

    I can't recall making that assumption, it certainly doesn't say anything about assumptions in the article?

    Again the article and results are up for critique and as per the responses in the forum the true use for these operators was revealled.

    I will let you supply the adjective for that attitude.

    I don't have that attitude, never have had, never will have. You are very much holding the wrong end of the stick and no, I am not making the assumption that it is a stick in your hand.

    Merry Christmas

  • My use of Except, Intersect is to prepare a slowly changing dimension table as written below. This will help me to get the changed address as well as new records. How can i re-write this in sql without EXCEPT operand

    SELECT CustomerID, Address1, District, Phone1, Phone2,....some 50 col FROM CustAddress_OLTP

    EXCEPT

    SELECT CustomerID, Address1, District, Phone1, Phone2,....some 50 col FROM CustAddress_SCD

  • I've actually used INTERSECT and EXCEPT quite a bit, not as a replacement for inner joins as some of these comments imply, but to simplify the comparison of two complex resultsets.

    I've also used INTERSECT as a performance tweak:

    http://www.real-sql-guy.com/2011/11/stop-performance-problem-with-intersect.html

  • Thanks for taking the time to create this post. I have often asked myself if there was any advantage in using some of these commands.

  • I agree. I have found 'excep't very useful when many columns are involved in a table and I want to exclude one entire table from the other. I think you will find in that case except will outperform a join or a not in statement. At least that is what I found on sql2008 r1

  • I really liked this article - very thorough. EXCEPT seems to be an analog to the Oracle MINUS, so sometimes I like it because I feel like I really know what's happening there (Oracle SQL being my first SQL language). But, my alternative is to use NOT EXISTS, so I wonder if you would consider expanding this to include the EXISTS/NOT EXISTS alternatives? I'm curious to see how they stack up within your example stream.

    In general, I really liked the approach and illustrations (both pictoral and logical).

    Thank you --

    Donna

Viewing 12 posts - 76 through 86 (of 86 total)

You must be logged in to reply to this topic. Login to reply