Novice needing better performance from query

  • This seemingly simple query is running way too long.  Any suggestions on how I can improve it?

    select shipment.shipment Shipment, shipment.store_no DC,

    shipped.container Container, max(carton.scandate) Last_ScanD,

    count(distinct(carton_no)) Total_Cartons,

    a.Unscanned_Cart

    from

    (select shipment, count(carton_no) Unscanned_Cart

    from carton where

    scandate = '12/30/1899'

    group by shipment)

    a, shipment, shipped, tacalendar, tamaster, carton

    where shipment.shipment = tamaster.name

    and tamaster.id_ta = tacalendar.id_ta

    and shipment.shipment = carton.shipment

    and shipment.shipment = shipped.shipment

    and shipment.shipment = a.shipment

    and tamaster.calendarmodule = 'SHIPMENT'

    and tacalendar.name = 'RCV - Receive DC'

    and shipment.completed = 0

    and tacalendar.act_finish = '12/30/1899'

    group by shipment.shipment, shipment.store_no, shipped.container, a.unscanned_cart

    Any help would be appreciated!

  • I would suggest you rewrite your SQL into ANSI accepted standards.  Using commas instead of explicitly using JOIN and since you are using commas you don't use the ON clause.  I realize the end result of data is identical but how SQL Server has to parse the T-SQL is not.

    I would suggest adding the JOIN's and ON's and possibly a with(nolock) if appropriate.  While it should help with performance it will definitely improve readability of code and it assists the server to know exactly how to use indexes that exist.  Try this code (assuming I have the joins correct) and see if it helps.  Note I did not put in the with(nolock) clause on the tables as I'm not sure if the use of the data allows for that:

    select  shipment.shipment Shipment,

     shipment.store_no DC,

     shipped.container Container,

     max(carton.scandate) Last_ScanD,

     count(distinct(carton_no)) Total_Cartons,

     a.Unscanned_Cart

    from (select shipment, count(carton_no) Unscanned_Cart

    from carton where scandate = '12/30/1899' group by shipment) a

    join shipment a.shipment = shipment.shipment

    join shipped on shipment.shipment = shipped.shipment

    join tamaster on shipment.shipment = tamaster.name

    join tacalendar on tamaster.id_ta = tacalendar.id_ta

    join carton on shipment.shipment = carton.shipment

    where tamaster.calendarmodule = 'SHIPMENT'

    and tacalendar.name = 'RCV - Receive DC'

    and shipment.completed = 0

    and tacalendar.act_finish = '12/30/1899'

    group by shipment.shipment, shipment.store_no, shipped.container, a.unscanned_cart

     

    If the phone doesn't ring...It's me.

  • Thanks for the suggestion Charles.  I'm running your query now and it is at around 22 minutes and counting.  The problem started when I added the count(distinct(carton_no), but I don't know any other method to get the results I need.  There are thousands of shipped records for each shipment and without the distinct it was multipling the count results.  Any other ideas?

  • Just a little pointer. When you write a count(distinct(whatever)), the distinct operation will be done by running another query. You can look at that query using the plan from query analyser. You can then adjust your indexes accordingly.

    Also could you post some sample data along with the required results so that we have a chance to give an alternative solution?

  • I would suggest you rewrite your SQL into ANSI accepted standards.  Using commas instead of explicitly using JOIN and since you are using commas you don't use the ON clause.  I realize the end result of data is identical but how SQL Server has to parse the T-SQL is not.

    Interesting thesis! JOINing in the WHERE clause is in case of an INNER JOIN still perfectly valid ANSI syntax.

    When you compare both queries and their execution plans you'll notice that not only the resultset, but also both plans are identical. The query optimiser is smart enough to figure this out by himself.

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • Having used both syntaxes I would agree without looking that the execution plan and query result would be the same. Given the amount of execution time I am guessing that we have not seen the final execution plan from the original query, but using Display Estimated Execution Plan in Query Analyzer will give some clues about index usage and what the real plan will be without having to wait on the query to run.

    ------------
    Buy the ticket, take the ride. -- Hunter S. Thompson

  • I'm still struggling with this same issue, so I'm going try try and provide a bit more information if anyone can help me out.  Below is my query minus the container data.  It takes 5 minutes to run and returns a total of 597 records.  A sample result set follows...

    select shipment.store_no DC, shipment.shipment Shipment,

    max(carton.scandate) Last_ScanD,

    count (carton_no) Total_Cartons, a.Unscanned_Cart

    from

    (select shipment, count(carton_no) Unscanned_Cart

    from carton where

    scandate = '12/30/1899'

    group by shipment)

    a, shipment, tacalendar, tamaster, carton

    where shipment.shipment = tamaster.name

    and tamaster.id_ta = tacalendar.id_ta

    and shipment.shipment = carton.shipment

    and shipment.shipment = a.shipment

    and tamaster.calendarmodule = 'SHIPMENT'

    and tacalendar.name = 'RCV - Receive DC'

    and shipment.completed = 0

    and tacalendar.act_finish = '12/30/1899'

    group by shipment.shipment, shipment.store_no, a.unscanned_cart

    order by shipment.store_no, shipemnt.shipment

    Results:

    11     27482 1899-12-30 00:00:00.000 176 176

    11     27488 2005-12-20 00:00:00.000 352 136

    11     27489 2005-12-16 00:00:00.000 209 202

    11     27490 2005-12-21 00:00:00.000 165 1

    Following is the query with the container data added.  It takes 7 minutes to run and returns 959 rows, but the problem is the Total Cartons are no longer correct...the count as been multipled by the number of times that shipment number appears in the shipped table.  A result set follows...

    select shipment.store_no DC, shipment.shipment Shipment,

    shipped.container Container, max(carton.scandate) Last_ScanD,

    count (carton_no) Total_Cartons, a.Unscanned_Cart

    from

    (select shipment, count(carton_no) Unscanned_Cart

    from carton where

    scandate = '12/30/1899'

    group by shipment)

    a, shipment, shipped, tacalendar, tamaster, carton

    where shipment.shipment = tamaster.name

    and tamaster.id_ta = tacalendar.id_ta

    and shipment.shipment = carton.shipment

    and shipment.shipment = shipped.shipment

    and shipment.shipment = a.shipment

    and tamaster.calendarmodule = 'SHIPMENT'

    and tacalendar.name = 'RCV - Receive DC'

    and shipment.completed = 0

    and tacalendar.act_finish = '12/30/1899'

    group by shipment.shipment, shipment.store_no, shipped.container, a.unscanned_cart

    order by shipment.store_no, shipment.shipment

    Results:

    11     27482 GATU 0094143    1899-12-30 00:00:00.000 2816 176

    11     27488 217 6564 0676    2005-12-20 00:00:00.000 3872 136

    11     27489 HLXU 3058570     2005-12-16 00:00:00.000 1672 202

    11     27490 HLXU 3111545     2005-12-21 00:00:00.000 495 1

    My solution was to change the count (carton_no) Total_Cartons to a count (distinct(carton_no).  The problem is now my run time is through the roof.  In fact, I have yet to let if finish running.  The last time I ran it I finally killed it after an hour.  That is simply an unacceptable amount of time.

    I hope this gives you a bit more info. and maybe a good solution for me...

    Thanks again!

  • Are all the tables indexed on the columns being referenced in the join? For example, is there an index on shipment.shipment?

    Have you looked at the query plan to see what is taking a long time? In QA, press CTRL-K and then run the query. You'll have a new tab in the results to examine the plan chosen and see which item takes the longest percentage wise. Look for straight arrows (scans) as well. These can be an indication of poor indexing.

  • There is a clustered index on the cartons table which seems to be sucking up the most processing time, i.e., 33% each time.

  • And is that showing a clustered index seek or a clustered index scan?

    ------------
    Buy the ticket, take the ride. -- Hunter S. Thompson

  • Scan

  • Likely that is part of the problem.

    Is the clustered index on an identity field perhaps? If so, you could add in your where clause ...And carton.identity_field > 0. That will cause the optimizer the choose a clustered index seek...at least that is the theory here. Without seeing the tables and indexing scheme I am guessing a bit.

    Beyond that, any other query that shows an Index Scan, Clustered Index Scan or Table Scan is a potential bottleneck and should be looked at.

    ------------
    Buy the ticket, take the ride. -- Hunter S. Thompson

  • Have you thought of getting the cartons with an inline query instead?  I've seen this improve my queries from time to time.  Frank K might have some info on if this impacts other processes as he has much more experience than me.

    For example:

    select shipment.store_no DC, shipment.shipment Shipment,

    shipped.container Container, max(carton.scandate) Last_ScanD,

    count (carton_no) Total_Cartons,

    (select count(carton_no) from carton a where

    scandate = '12/30/1899' and a.shipment = shipment.shipment) Unscanned_Cart

    from

    --a,

    shipment, shipped, tacalendar, tamaster, carton

    where shipment.shipment = tamaster.name

    and tamaster.id_ta = tacalendar.id_ta

    and shipment.shipment = carton.shipment

    and shipment.shipment = shipped.shipment

    --and shipment.shipment = a.shipment

    and tamaster.calendarmodule = 'SHIPMENT'

    and tacalendar.name = 'RCV - Receive DC'

    and shipment.completed = 0

    and tacalendar.act_finish = '12/30/1899'

    group by shipment.shipment, shipment.store_no, shipped.container, a.unscanned_cart

    order by shipment.store_no, shipment.shipment

    If the phone doesn't ring...It's me.

  • I would try the following

    Instead of creating a dynamic table within the select statement, declare a variable table ...

    declare @a TABLE ( shipment datatype, carton_no_count int, Unscanned_Cart int, PRIMARY KEY shipment&nbsp

    This allows you to actually index the table and seperates the complexity for simplier debugging.

  • Thanks for the help all...I finally got a good result.  Charles, your suggestion helped.  I passed both count statements via the method you suggested and that allowed the query to run faster and give me the result I needed.  The final query looks like this:

    select shipment.store_no DC, shipment.shipment Shipment,

    shipped.container Container,

    max(carton.scandate) Last_ScanD,

    (select count(carton_no) from carton a where

    a.shipment = shipment.shipment) Total_Cartons,

    (select count(carton_no) from carton b where

    scandate = '12/30/1899' and b.shipment = shipment.shipment) Unscanned_Cart

    from

    shipment, shipped, tacalendar, tamaster, carton

    where shipment.shipment = tamaster.name

    and tamaster.id_ta = tacalendar.id_ta

    and shipment.shipment = carton.shipment

    and shipment.shipment = shipped.shipment

    and tamaster.calendarmodule = 'SHIPMENT'

    and tacalendar.name = 'RCV - Receive DC'

    and shipment.completed = 0

    and tacalendar.act_finish = '12/30/1899'

    group by shipment.shipment, shipment.store_no, shipped.container

    order by shipment.store_no, shipment.shipment

    Thanks again!

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply