Different ways of using EXISTS

  • Hey - had a bit of a discussion with one of the other DB guys at my company, regarding what the best way of using EXISTS is. He uses WHERE EXISTS(SELECT TOP 1 SomeColumn FROM TABLE ...). I, on the other hand, had been using WHERE EXISTS(SELECT TOP 1 1 FROM TABLE ...).

    I'd done a bit of searching, but couldn't really find any definitive reasoning behind what the best approach was, so I did a bit of testing on one of our databases, and got some confusing results:

    SELECT TOP 100000 * FROM Table1 (NOLOCK)

    WHERE EXISTS(SELECT TOP 1 1 FROM Table2 WHERE Table1Column = Table2Column)

    Table 'Table2'. Scan count 1, logical reads 443, physical reads 0, read-ahead reads 2, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Table1'. Scan count 1, logical reads 18551, physical reads 0, read-ahead reads 2, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SELECT TOP 100000 * FROM Table1 (NOLOCK)

    WHERE EXISTS(SELECT TOP 1 * FROM Table2 WHERE Table1Column = Table2Column)

    Table 'Table2'. Scan count 1, logical reads 443, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Table1'. Scan count 1, logical reads 18551, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SELECT TOP 100000 * FROM Table1 (NOLOCK)

    WHERE EXISTS(SELECT * FROM Table2 WHERE Table1Column = Table2Column)

    Table 'Table2'. Scan count 1, logical reads 443, physical reads 0, read-ahead reads 4, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Table1'. Scan count 1, logical reads 18551, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    All results have the same scan count and the same logical reads, however, the first case has 2 read-ahead reads on both tables, while the last case has 4 read-ahead reads on one table and none on the other, and the middle case has no read-ahead reads at all.

    I looked up what read-ahead reads were, in BOL, and found that they were a performance optimization that SQL Server uses, so it seems to me like having those are good. Also, under the assumption that Table1 is significantly higher in logical reads than Table2, it seems like the SELECT TOP 1 1 is the best approach, but if the situation were reversed, then SELECT * would be better?

  • Hi kramaswamy

    There should be no difference between "SELECT *" and "SELECT TOP(1) 1" (you should use the braces syntax for SSE2k5 and later versions) if you use them for EXISTS statement. EXISTS just validates the availability of the data, but don't access them.

    Anyway, I still prefer the "SELECT TOP(1) 1" since I hate "SELECT *" 😀

    Greets

    Flo

  • Well, I agree there should be no difference 😛 but there is a difference, at least in the IO statistics, which is what confuses me. What's the reason for the difference, and is it even a relevant difference?

  • Are you doing a fair test ?

    Are you doing 'dbcc dropcleanbuffers' between your tests ?



    Clear Sky SQL
    My Blog[/url]

  • Hm - nope, never even heard of that command. What's it do? I'll try it out and see if the results change any.

  • It's the logical IOs that you need to compare to see which does the most work. Read-ahead reads means that some of the required pages were not in memory (maybe due to memory pressure) and had to be fetched from disk.

    All three are doing the same logical IOs, hence the same amount of pages are being read.

    Personally, my preference for EXISTS is EXISTS (SELECT 1 FROM SomeTable ..... )

    The 1 to indicate that no columns are returned, top 1 not specified because it's irrelevant.

    I have seen a case where including a top 1 in the exists was slower (more CPU time). Can't recall situation offhand.

    Have you checked the execution plan of these queries?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • It purges any in-memory pages SQLServer has , dont attempt to compare relative performance of queries with doing it.

    http://msdn.microsoft.com/en-us/library/ms187762.aspx



    Clear Sky SQL
    My Blog[/url]

  • Dave Ballantyne (8/26/2009)


    dont attempt to compare relative performance of queries with doing it.

    Well....... depends.

    Personally, I never use it when comparing queries.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Yeah, the execution plans were all identical. So, you're saying that the read-ahead reads are something that could change even if I were to re-run the exact same query? 'Cause actually, that's what I've been noticing when doing some more tests on a more controlled environment. I ran a query once, and there were 0 read-ahead reads, then I re-ran the same query, and suddenly almost all the reads were read-ahead. I guess this behavior is the explanation for why, when you run a query, and then run it again, the second time around it runs almost instantaneously? 'Cause all the information is stored in a cache of sorts?

  • My point being , to the 'untrained' eye what is the worst performing of these two queries 😀

    edit : Judging only by the output of 'set statistics io on'

    Query 1 = 'top 1 1'

    Query 2 = 'top 1 *'

    dbcc dropcleanbuffers

    select count(*) from sys.tables where exists(Select top 1 1 from sys.tables st2 where st2.object_id =sys.tables.object_id)

    go

    select count(*) from sys.tables where exists(Select top 1 * from sys.tables st2 where st2.object_id =sys.tables.object_id)

    go

    dbcc dropcleanbuffers

    go

    select count(*) from sys.tables where exists(Select top 1 * from sys.tables st2 where st2.object_id =sys.tables.object_id)

    go

    select count(*) from sys.tables where exists(Select top 1 1 from sys.tables st2 where st2.object_id =sys.tables.object_id)

    go



    Clear Sky SQL
    My Blog[/url]

  • kramaswamy (8/26/2009)


    Yeah, the execution plans were all identical. So, you're saying that the read-ahead reads are something that could change even if I were to re-run the exact same query?

    Correct. It depends on what's in the data cache at a particular time.

    I guess this behavior is the explanation for why, when you run a query, and then run it again, the second time around it runs almost instantaneously? 'Cause all the information is stored in a cache of sorts?

    Yup. The first time the query has to be compiled and the data pages fetched into memory. Second time the exec plan can be retrieved from cache and the data pages are in memory, so don't have to be fetched from the disk again.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Hi kramaswamy,

    I have worked in some performance tuning tasks and we always prefer to use

    EXISTS(select 1 from table Where.............)

    Also one more point if you are fetching data from multiple tables then if you always have data in child table and in query you want the result only when the data exists in child table then use join instead of EXISTS()

    since it will search for each row.

    If you may or may not have data in child table like when we use LEFT join then its better to use exists.

    We use these since we search records in tables using the parameter passed so

    @Parameter is null or exists(select 1 from child where col = @parameter)

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply