Need help with a Not Exists sub query please

  • List the empid, lastname, and firstname of all employees who did not take an order in the

    month of February, 2007. Write a correlated subquery that uses NOT EXISTS.

    SELECT DISTINCT O.empid, HR.firstname, HR.lastname

    FROM Sales.Orders AS O

    JOIN HR.Employees AS HR

    ON O.empid = HR.empid

    WHERE NOT EXISTS

    (SELECT empid

    FROM Sales.Orders

    WHERE YEAR (orderdate) = '2007' AND

    MONTH (orderdate) = '02')

    What am I doing wrong?

  • Year() and Month() function returns int data.

    .....

    WHERE NOT EXISTS

    (SELECT empid

    FROM Sales.Orders

    WHERE YEAR (orderdate) = 2007 AND <==

    MONTH (orderdate) = 02) <==

  • Thank you,

    SELECT DISTINCT O.empid, HR.firstname, HR.lastname

    FROM Sales.Orders AS O

    JOIN HR.Employees AS HR

    ON O.empid = HR.empid

    WHERE NOT EXISTS

    (SELECT DISTINCT empid

    FROM Sales.Orders

    WHERE YEAR (orderdate) = 2007 AND

    MONTH (orderdate) = 02)

    I still get no results

  • Correct me if I am wrong, but this reads like a classroom problem. I'm happy to help you with it, but only if you promise to read through my entire answer and learn something, Junior! 🙂

    You are misusing the WHERE NOT EXISTS clause. For one thing, it is not in a correlated sub-query. A correlated sub-query is a SELECT statement nested inside another T-SQL statement containing a reference to a column (or columns) in the outer query. The correlated (or inner) portion of the query is executed once for each record in the outer portion of the query.

    The reason I say this must be a classroom problem is that correlated sub-queries are notoriously inefficient. They are very CPU-Intensive. Often, cursors will perform better than correlated sub-queries.

    Let's take a look at how I would solve this problem at work, and then we'll answer the class question.

    At work I would solve this problem with an Outer Join, like so:

    WITH [SalesOrders]

    AS

    ( SELECT EmpID

    FROM Sales.Orders

    WHERE OrderDate >= '2007-02-01'

    AND OrderDate < '2007-03-01'

    GROUP BY EmpID

    )

    SELECT Employees.EmpID

    ,Employees.FirstName

    ,Employees.LastName

    FROM HR.Employees AS Employees

    LEFT JOIN SalesOrders

    ON Employees.EmpID = SalesOrders.EmpID

    WHERE SalesOrders.EmpID IS NULL;

    This illustrates a classic problem in T-SQL programming - find the missing record. We want all of the employees who didn't sell something in the month of February, 2007. That is just the sort of question management would want to answer. So, you've got a list of sales orders for the month of February with the employee ID's and another list of all employees.

    This is just the sort of thing that outer joins were designed for. Outer joins come in 3 flavors: RIGHT, LEFT & FULL. The LEFT JOIN that I have used here returns all of the records from the left set (Employees) and all matching records in the right set (Sales Orders). We want the records where there is not a match - that is the Sales Order record is missing (IS NULL).

    Now, you will have noticed that I've made a few other changes to your query. Let me explain those. I took the sub-query out of the middle of the statement and put it at the top as a Common Table Expression (CTE). That is the WITH [SalesOrders] AS clause. I like to use CTE's because they are easier to read, but it works just like a sub-query.

    You'll also note that I put a GROUP BY clause in the sub-query. We don't need a list of all of the sales that were made in February 2007. We just need a list of employees who made a sale in February. GROUP BY eliminates the duplicates. I also just return the fields I need (in this case Employee ID.) This makes for a more efficient query. SELECT DISTINCT is equivalent to saying GROUP BY every field in the SELECT list. DISTINCT is also a notoriously CPU-Intensive T-SQL construct. (At least it can be. I think it is best avoided, if possible.)

    You will also note that I changed the way you are checking the order dates. First of all, your use of the YEAR and MONTH functions is wrong in that both of these functions return integers and you are comparing them to strings. This requires a completely unnecessary implicit type conversion.

    More importantly, using functions in the WHERE clause like this renders it non-SARGable. That's an invented T-SQL word. (SARG stands for Search ARGument.) If there is an index on the Order Date field (and there probably would be) it can't be used in your original query, because you aren't searching on Order Date. You are searching on the results of a function. So, the query optimizer has no choice but to do a scan of the entire table, running these two functions on the Order Date column and then checking one by one to see if they match the values you have indicated.

    My WHERE clause, on the other hand, compares the search criteria to the Order Date column itself, directly. Now the index can be used. Either a more efficient Index Seek will be used, or the Index Scan will be done against a limited range of the index. Either way, it is more efficient.

    It is very important to remember when you are writing T-SQL code: write the code as if it will run against a table with 10,000,000 records, because one day it will be running against a table that large. You are probably developing against a table with a few hundred records. What works well there will not work well in production.

    OK, now lets solve the class problem. This is the part you are looking for, but I hope you learned something from the previous discussion as well! 🙂

    The problem is the same: find all of the employees who didn't sell something during February 2007. This time, however, we have to use a correlated sub-query (yuck). A correlated sub-query is a SELECT statement nested inside another T-SQL statement containing a reference to a column (or columns) in the outer query. As I stated previously, the correlated subquery will run once for each record selected by the outer query - making it hideously inefficient.

    The original problem also directed us to use a WHERE NOT EXISTS clause. So, to restate the problem in these terms, we want to find all employees in the Employee table WHERE an employee ID does NOT EXIST in the Sales Order table for February 2007. So, now let's write this awful query as directed:

    SELECT HR.EmpID

    ,HR.FirstName

    ,HR.LastName

    FROM HR.Employees AS HR

    WHERE NOT EXISTS ( SELECT O.EmpID

    FROM Sales.Orders AS O

    WHERE O.EmpID = HR.EmpID

    AND YEAR(O.OrderDate) = 2007

    AND MONTH(O.OrderDate) = 2

    );

  • Thanks for your help. I figured it out

  • Thanks for a valuable lesson between JOINs and the uncorrelated subqueries. Its is very kind of you to take the time to thoroughly explain how it works and differs. I wish teachers would actually take the time to TEACH instead of just dump information on us newbies.

    Aloha!

    -V

  • Late I know, but I didn't see this when it was posted

    David Moutray (12/5/2011)


    The reason I say this must be a classroom problem is that correlated sub-queries are notoriously inefficient. They are very CPU-Intensive. Often, cursors will perform better than correlated sub-queries.

    That unfortunately is a myth, nothing more. The only time correlated subqueries are a performance problem is when the have a TOP 1 ... ORDER BY construct or when the correlation condition is an inequality. Otherwise they are very efficient, SQL can quite easily convert them into a join, semi-join or anti-semi-join and produce an optimal execution plan.

    At work I would solve this problem with an Outer Join, like so:

    The use of an outer join to find the missing rows is less efficient than using NOT EXISTS. Not much less efficient, but it is slightly less efficient (higher CPU, higher duration than a comparable NOT EXISTS)

    http://sqlinthewild.co.za/index.php/2010/03/23/left-outer-join-vs-not-exists/

    http://sqlinthewild.co.za/index.php/2010/04/27/in-exists-and-join-a-roundup/

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Expressing this sort of requirement using a logical (anti-) semi-join using EXISTS, INTERSECT, or EXCEPT seems more natural to me than introducing NULLs with an outer join, and then filtering them out. A common misconception is that logical semi-joins are always implemented using nested loops, this is not so:

  • Thanks for the useful information.

    I would have written it as below as I am just getting familiar with the "With" syntax"

    SELECT

    Employees.EmpID

    ,Employees.FirstName

    ,Employees.LastName

    FROM HR.Employees AS Employees

    LEFT OUTER JOIN

    (SELECT

    EmpID

    FROM Sales.Orders

    WHERE

    OrderDate >= convert(DateTime,'20070201',112)

    AND OrderDate < convert(DateTime,'20070301',112)

    GROUP BY EmpID) as SalesOrders

    ON

    Employees.EmpID = SalesOrders.EmpID

    WHERE SalesOrders.EmpID IS NULL;

    Ignoring any typos, how does this compare to the suggested best solution?

  • http://sqlinthewild.co.za/index.php/2010/03/23/left-outer-join-vs-not-exists/

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply