Why does query run so long?

Question

Post reply

Why does query run so long?

Ed Zann

SSCrazy Eights

Points: 8694
More actions
February 5, 2004 at 10:51 am

#107043

This query is part of a DTS package that never seems to complete (after 3 days we canceled):

SELECT distinct Prcs_dte

FROM tbl_Tracking

WHERE prcs_dte not in (Select distinct prcs_dte FROM Acct_Stats)

tbl_Tracking has 8 million rows, Accts_stats has 3 million rows. Both tables have Prcs_dte indexes.

If I put the outer query results in a temp table and the inner query results in another temp table and then select the records not appearing in both, the query executes in 10 sec.

I'm thinking that maybe the above query is generating an intermediate table that is a cross join of the two tables before coming up with a result. That would be 24 trillion rows. That might be why it never completes.

Does anyone have any ideas? I guess I know how it can be fixed... I just don't understand why such an apparently simple query is causing such a problem.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

Russel Loski SSCertifiable Points: 7054 More actions · Answer 1

Russel Loski

SSCertifiable

Points: 7054

February 5, 2004 at 11:47 am

#492938

Try outer join syntax?

SELECT distinct tr.Prcs_dte

FROM tbl_Tracking tr left outer join Acct_Stats asts

on tr. prcs_dte = asts. prcs_dte

where asts. prcs_dte is null

This may work faster.

Russel Loski, MCSD

Russel Loski, MCSE Business Intelligence, Data Platform

rcnelsonmba SSC Eights! Points: 813 More actions · Answer 2

I'm a firm believer in the use of temp tables, especially when using stored procedures in datawarehouse/decision support environments, where you always have tables with rows in the millions, often hundreds of millions, or even billions. Why try and right one query that runs for days when 2 querys that build temp talbes and 1 query joining them runs in practically no time at all.

TriggerMe SSChampion Points: 11752 More actions · Answer 3

TriggerMe

SSChampion

Points: 11752

February 5, 2004 at 4:30 pm

#492969

DISTINCT does not take advantage of Indexes!

Try using GROUP BY Instead with an INDEX on Prcs_dte

Kindest Regards,

wz700 SSCertifiable Points: 7133 More actions · Answer 4

wz700

SSCertifiable

Points: 7133

February 5, 2004 at 10:32 pm

#492995

Every row in tbl_Tracking will scan through Acct_Stats. thus

8 million * 3 million at least.

Not to mention distint's impact.

Corellated join is an option.

Greg Walker Old Hand Points: 369 More actions · Answer 5

Ran into a very similar problem here. Since you have the correct indexes, try using:

SELECT distinct Prcs_dte

FROM tbl_Tracking

WHERE NOT EXISTS (Select prcs_dte FROM Acct_Stats where prcs_dte = tbl_Tracking.Prcs_dtr)

I made a similar change to some of our queries here (very small tables), and execution time came down by factors of 100.

I think it's because the NOT IN has to look through the whole list, while the not exists query just has to do a quick index seek.

Greg Walker

DBA, ExpenseWatch.com

Greg Walker
DBA, ExpenseWatch.com

Johan Bijnens SSC Guru Points: 134873 More actions · Answer 6

use " NOT EXISTS " ... that's the way to do it !

speed it up by adding an index on table/view Acct_Stats column prcs_dte.

Johan

Learn to play, play to learn !

Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:

- How to post Performance Problems
- How to post data/code to get the best help[/url]

- How to prevent a sore throat after hours of presenting ppt

press F1 for solution, press shift+F1 for urgent solution 😀

Need a bit of Powershell? How about this

Who am I ? Sometimes this is me but most of the time this is me

CAGreensfelder SSC Eights! Points: 919 More actions · Answer 7

Check out this previous thread on NOT IN vs NOT EXISTS vs LEFT OUTER JOIN. (read through to the end.)

http://qa.sqlservercentral.com/forum/topic.asp?TOPIC_ID=18745&FORUM_ID=8&CAT_ID=1&Forum_Title=T%2DSQL&Topic_Title=Select+Statement+Using+Not+In+and+Null+Values

We had similar problems with NOT IN queries on a table with a half million records taking FOREVER to finish and have started switching them to LEFT OUTER JOINs (even if you have to draw pictures then squint and look at them sideways to understand LOJs the first couple of times!!). They are much harder to understand and much harder to read & figure out what they do 6 months later, but the speed difference makes it worth the effort for larger tables.

Someone in this thread said he tested LEFT OUTER JOIN vs NOT EXISTS querries and found the LEFT OUTER JOIN performed slightly faster.

Ed Zann SSCrazy Eights Points: 8694 More actions · Answer 8

I appreciate all of your input on this.

I have tried the different approaches suggested. I have found that the temp table approach is the fastest in my situation. Here is the code I settled on:

SELECT DISTINCT Prcs_dte

INTO #A_Dates

FROM tbl_Tracking

SELECT DISTINCT Prcs_dte

INTO #B_Dates

FROM Acct_Stats

SELECT A.Prcs_dte

FROM #A_Dates A

LEFT OUTER JOIN #B_Dates B

ON A.Prcs_dte = B.Prcs_dte

WHERE B.Prcs_dte IS NULL

I get a result in about 10 secs.

Thanks

Johan Bijnens SSC Guru Points: 134873 More actions · Answer 9

replace the distinct by a group by and you'll gain more !

SELECT Prcs_dte

INTO #A_Dates

FROM tbl_Tracking group by Prcs_dte

SELECT Prcs_dte

INTO #B_Dates

FROM Acct_Stats group by Prcs_dte

Still I'd suggest you'd profile it and compare execution-plans for all approches.

Johan

Learn to play, play to learn !

Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:

- How to post Performance Problems
- How to post data/code to get the best help[/url]

- How to prevent a sore throat after hours of presenting ppt

press F1 for solution, press shift+F1 for urgent solution 😀

Need a bit of Powershell? How about this

Who am I ? Sometimes this is me but most of the time this is me

kokyan SSCommitted Points: 1584 More actions · Answer 10

Dear Experts,

As the forum become "hot" here, I want to come out with some kind of similar problem. This query run quite long and sometimes caused my DTS failed.

SELECT * FROM TBL1 WHERE FLD1 = (SELECT MIN(FLD1) FROM TBL1)

Is there any way to make this query run with best performance?

Thanks in advance.

Regards,
kokyan

Russel Loski SSCertifiable Points: 7054 More actions · Answer 11

Russel Loski

SSCertifiable

Points: 7054

February 8, 2004 at 8:21 pm

#493210

What are you trying to do with this query?

Russ

Russel Loski, MCSE Business Intelligence, Data Platform