Query with multiple And statements

  •  

    Hi,

    I've started working on this new project. I've a big query below that is joining around 20 table. Developer before me had written it and i've to correct it.How can i make possible improvement in this query? If i remove some tables or some AND conditions from query then i got thousands of records. I've already created good indexes on all the tables in query. There are no table scans in the execution plan. There are NO book marks. There are some Index scans in the plan but can't take care of that because there are more than 20 tables i'm trying to join here.I will really appreciate your feedback.

    Thanks

    SELECT DISTINCT

     Orders.Order_ID, Orders.Order_DateOrdered, Orders.Order_Taxes, Orders.Order_Commission, 

     Orders.Order_Comments, Orders.Order_OrderNumber, Orders.Order_CustomerNumber,

     Orders.Order_Supplier, Orders.Order_CCUsed, Orders.Order_OrderedBy , Orders.Order_Shipping,

    Orders.Order_AddressInfo,Orders.Order_FullAddressInfo,GiftPurchase.GiftPurchase_Quantity, GiftPurchase.GiftPurchase_Price,

     GiftPurchase.GiftPurchase_Shipping, GiftPurchase.GiftPurchase_Taxes, UserProduct.UserProduct_Name,

    UserProduct.UserProduct_Description,UserProduct.UserProduct_ModelType, UserProduct.UserProduct_DeliveryOption,

     UserProduct.UserProduct_ID,

    Shipping.Shipping_ShippedBy, Shipping.Shipping_DateShipped, Shipping.Shipping_ShipmentArrivalDate,

    Shipping.Shipping_Tracking, Shipping_TransactionFee, Shipping.Shipping_ShipmentMethod, Shipping.Shipping_Cost,

     Request.Request_Event, ShippingCollection_TS, ShippingCollection_ID,ShippingCollection_MerchantOrderNumber,

    ShippingCollection_OrderNumber, SupplierOrder_OrderNumber , SupplierOrder_ID, Store_ID,Person_ID, Person_Forename,

    Person_Surname, Person_Email

    FROM

    Orders, GiftPurchase, UserProduct, SupplierOrderOrders, SupplierOrder, Shipping, Request, ShippingCollection, Store,

    Supplier,Reqfill, Transactions, Person, Event , Purchaser , GiftSupplierFilled, AllocatedGift,

    AllocateGiftToSupplier

     

    WHERE GiftPurchase.GiftPurchase_Request = UserProduct.UserProduct_Request AND

    Request.Request_ID = UserProduct.UserProduct_Request AND Orders.Order_ID= GiftPurchase.GiftPurchase_Order

    AND SupplierOrderOrders.SupplierOrderOrders_Orders = Orders.Order_ID

    AND SupplierOrderOrders.SupplierOrderOrders_SupplierOrder = SupplierOrder.SupplierOrder_ID

    AND SupplierOrder.SupplierOrder_Supplier = Supplier_ID

    AND GiftPurchase.GiftPurchase_Shipping = Shipping.Shipping_ID  AND

    ShippingCollection.ShippingCollection_Shipping = Shipping.Shipping_ID

    AND Supplier_ID = 703

    AND Supplier_Store = Store_ID

    AND Request_ID = Reqfill_Request

    AND Request_Event = Event_ID

    AND Purchaser_Person = Person_ID

    AND Purchaser_Transaction =  Transaction_ID

    AND Reqfill_Transaction = Transaction_ID

     

    AND  Orders.Order_ID = GiftSupplierFilled.GiftSupplierFilled_Order

    AND UserProduct.UserProduct_ID=GiftSupplierFilled.GiftSupplierFilled_UserProduct

    AND UserProduct.UserProduct_ID = AllocatedGift.AllocatedGift_UserProduct AND

    Transactions.Transaction_ID = AllocatedGift.AllocatedGift_Transaction

    AND  AllocatedGift.AllocatedGift_ID = AllocateGiftToSupplier.AllocateGiftToSupplier_AllocatedGift

    AND  GiftSupplierFilled.GiftSupplierFilled_AllocateGiftToSupplier = AllocateGiftToSupplier.AllocateGiftToSupplier_ID

    AND Person_Surname like '%Takaba%' 

    ORDER BY ShippingCollection_ID DESC, Request_Event, SupplierOrder_ID, Orders.Order_ID, UserProduct.UserProduct_ID;

  • You can make a tremendous improvement for this query! As it is right now, you are selecting the cartesian product (all possible combinations) for the 18 tables you have. That would easily bring the query optimizer to it's knees. Multiply the number of records in every 18 tables you have and post the number here. I'll bet the product is some billions and more. I'll write down some things for you to do now. Remember to use fully qualified named (Order.OrderID, not just OrderID) in the future, for your own sake. It will be much easier to debug sometime in the future. If succeeding to implement the things written below, I would estimate that this query would perform at least 100 times better.

    1. Rewrite for using INNER JOINs

        SELECT     Orders.Order_ID

                   ...

        FROM       Orders

        INNER JOIN GiftPurchase GP ON GiftPurchase.GiftPurchase_Order = Orders.Order_ID

        INNER JOIN ...

    I tried to do that for you, but since there are many unqualified names, it wasn't possible for me, since I can't see the table layouts.

    2. Place everything in a VIEW, except

        AND Supplier_ID = 703

        AND Person_Surname like '%Takaba%'

    Then call the view from your stored procedure as

        SELECT * FROM MyView WHERE MyView.Supplier_ID = 703 AND MyView.Person_Surname LIKE '%Takaba%'

    This way, the query optimizer in SQL Server doesn't have to recompile the joins every time!

    3. If possible, make the view indexed.

     

    That's my opninion.


    N 56°04'39.16"
    E 12°55'05.25"

  • I agree that you should use the new JOIN syntax to make the code easier to read, but it won't necessarily make any difference to the query plan - it certainly isn't needed to prevent a physical Cartesian product! The optimiser has been using WHERE conditions as join predicates since it was first developed. It might help, by giving the optimiser a good start so that it will have time to generate a decent plan before it gets fed up and starts the query (yes, that's how it works!).

    But you have circular joins, so putting them in the 'wrong' join could prevent the optimiser from generating a good plan. They are presumably needed to limit records correctly, but they make the optimisers job much harder. they are marked ---**** in the accompanying code.

    I think you can also get rid of the Events and Stores tables, unless joins to them are needed to check referential integrity of the FK that points to it (I hope it's not!). You also use SupplierOrderOrders to join Supplier to SupplierOrders, which is then joined to orders. Doesn't SupplierOrders have a foreign key to Supplier? And shouldn't SupplierOrderOrders be used to join SupplierOrders to Orders? Just a question, as I obviously don't know your schema.

    I've attached a proposed rewrite of your code - it's not checked or tested in any way, so it might not even compile, but I hope it gives an idea of how to proceed.

    Another thing is that half your tables are used only for joining, which might mean that they could be eliminated (like the Events and Stores tables can if my suggestion above is correct), if you can find another join path which defines the correct recordset. This might be the case if you are joining two table to the same PK on a third table - in which case the (indexed) foreign keys could possibly be joined directly to each other Or there might be denormalised foreign keys on your table - it looks as though there may be, since the joins around SupplierOrderOrders seem a bit odd, as I suggest.

    Finally, the DISTINCT could be expensive, so you might need to look at why it is needed - could be more joins I'm afraid! Can you post an execution plan?

    Also, consider whether the query could be broken down into smaller views by subject matter, then joined. It might make it easier to optimise, and once the units are optimised (and prefereable each one has a good WHERE clause attached), they can be reused with greater confidence in other boig queries. It might just make preformnce worse though...

    select

    distinct

    ord

    .Order_ID, ord.Order_DateOrdered, ord.Order_Taxes, ord.Order_Commission, ord.Order_Comments, ord.Order_OrderNumber, ord.Order_CustomerNumber, ord.Order_Supplier, ord.Order_CCUsed, ord.Order_OrderedBy , ord.Order_Shipping, ord.Order_AddressInfo, ord.Order_FullAddressInfo,

    gp

    .GiftPurchase_Quantity, gp.GiftPurchase_Price, gp.GiftPurchase_Shipping, gp.GiftPurchase_Taxes,

    up.UserProduct_Name, up.UserProduct_Description,up.UserProduct_ModelType, up.UserProduct_DeliveryOption, up.UserProduct_ID,
    sh.Shipping_ShippedBy, sh.Shipping_DateShipped, sh.Shipping_ShipmentArrivalDate, sh.Shipping_Tracking, sh.Shipping_TransactionFee, sh.Shipping_ShipmentMethod, sh.Shipping.Shipping_Cost,

    re

    .Request_Event,

    sc.ShippingCollection_TS, sc.ShippingCollection_ID, sc.ShippingCollection_MerchantOrderNumber, sc.ShippingCollection_OrderNumber,
    so.SupplierOrder_OrderNumber , so.SupplierOrder_ID,
    su.Supplier_Store Store_ID,
    pe.Person_ID, pe.Person_Forename, pe.Person_Surname, pe.Person_Email

    from

    dbo.Supplier su

    ---------------------join dbo.Store on su.Supplier_Store = st.Store_ID
    ------------------------ - Not needed. You are joining FK to PK to get PK.
    ------------------------ Assuming you don't need to check ref integrity, you can just return FK.
    /**/

    join dbo.SupplierOrder so

    /**/

    on so.SupplierOrder_Supplier = su.Supplier_ID

    /****/

    join dbo.SupplierOrderOrders soo

    /****/

    on soo.SupplierOrderOrders_SupplerOrder = so.SupplierOrder_ID

    /******/

    join dbo.Orders ord

    /******/

    on soo.SupplierOrderOrders_Orders = ord.Order_ID

    /********/

    join dbo.GiftPurchase gp

    /********/

    on ord.Order_ID= gp.GiftPurchase_Order

    /**********/

    join dbo.Request re

    /**********/

    on gp.Request_ID = re.Request_ID

    ---------- join dbo.[Event] ev --not needed, I don't think.
    ---------- on re.Request_Event = ev.Event_I
    /************/

    join dbo.UserProduct up

    /************/

    on re.Request_ID = up.UserProduct_Request

    /************/

    join dbo.Reqfill rf

    /************/

    on rf.Reqfill_Request = re.Request_ID

    /*************/

    join dbo.Transactions tr

    /*************/

    on tr.Transaction_ID = rf.Reqfill_Transaction

    /**************/

    join dbo.Purchaser pu

    /**************/

    on pu.Purchaser_Transaction = tr.Transaction_ID

    /****************/

    join dbo.Person pe

    /****************/

    on pu.Purchaser_Person = pe.Person_ID

    /**************/

    join dbo.AllocatedGift ag

    /**************/

    on ag.AllocatedGift_Transaction = tr.Transaction_ID

    /**************/

    and ag.AllocatedGift_UserProduct = up.UserProduct_ID ---****

    /****************/ join dbo.AllocateGiftToSupplier agts

    /****************/

    --if you had a supplier ID to join on, that would be helpful.

    /****************/ on ag.AllocatedGift_ID = agts.AllocateGiftToSupplier_AllocatedGift

    /**********/

    join dbo.GiftSupplierFilled gsf

    /**********/

    on gsf.GiftSupplierFilled_UserProduct = up.UserProduct_ID

    /**********/

    and gsf.GiftSupplierFilled_Order = ord.Order_ID --*****

    /**********/ and gsf.GiftSupplierFilled_AllocateGiftToSupplier = agts.AllocateGiftToSupplier_ID --*****
    /**********/ join dbo.Shipping sh

    /**********/

    on gp.GiftPurchase_Shipping = sh.Shipping_ID

    /************/

    join dbo.ShippingCollection sc

    --!!! This is joined only on Shipping ID
    --!!! so your rows will be multiplied by the average no of shipping collections per shipping record.
    /************/

    on sc.ShippingCollection_Shipping = sh.Shipping_ID

    where

    su.Supplier_ID = 703

    and

    pe.Person_Surname like 'Takaba%' --don't use a leading wildcard - it means you can't use an index.

    ORDER

    BY sc.ShippingCollection_ID DESC,

    re.Request_Event, so.SupplierOrder_ID, ord.Order_ID, up.UserProduct_ID;

     

    Tim Wilkinson

    "If it doesn't work in practice, you're using the wrong theory"
    - Immanuel Kant

  • Hi Stax68 and Peter,

    Thanks a lot for both of your reply.

    Stax I worked on your suggestions.

    (i)I checked the schema but the whole database is a mess. SupplierOrders doesn't have a foreign key to Supplier and that's true for all these 18 tables.There are no foreign key constraint defined on any of these 18 tables used in these joins. So I guess we can not eliminate joins based up on the foreign key assumptions  you told me.

     

    (ii) Yes I was able to remove join re.Request_Event = ev.Event_I but it didn't give much drastic improvement.

    (iii) In the original query userproduct table was costing the maximum and that was 28%.It was doing cluster index scan because userproduct is a big table for around 600000 records.

    After  I changed the query to the way you mentioned below

     pe.Person_Surname like 'Takaba%'(removed the % sign before t)

    the cost for userproduct table was reduced to 5% from 28%.Makes me wonder how such a small change in person's table column criteria did so much drastic improvement in the userproduct table. If you have answer then please do explain to me.

    Also I would also like to get some benifit from peter's suggestion. If i put this code in to view will that be helpful? As peter is mentioning that if i put everything like..

    -------------------------------------------------------------------

    Place everything in a VIEW, except

        AND Supplier_ID = 703

        AND Person_Surname like '%Takaba%'

    Then call the view from your stored procedure as

        SELECT * FROM MyView WHERE MyView.Supplier_ID = 703 AND MyView.Person_Surname LIKE 'Takaba%'

    This way, the query optimizer in SQL Server doesn't have to recompile the joins every time!

    ----------------------------------------------------------

    Questions i've is that are the views that much helpful? If i've view then is it stored just similar to stored procedure and don't need to recompile them after once they are created? And in my case putting the code which i changed the way Stax68 mentioned in a view will be more faster?

    Please let me know.

    Thanks for your help on this.

     

  • The reason a leading wildcard slows things down is because you can't use an index. The index is arranged in (an) alphabetical order, so to find things quickly you need to know how the word starts. If you don't know that, SQL has to look in each data value to see if it matches your pattern.

    Re. the view. This will probably give some performance improvement, and it's certainly good practice, but it may not make a huge difference. Fully (schema + table) qualifying column and table names would meanyou can reproduce the advantages of the view simply by running exactly the same statement more than once. It's to do with execution plan reuse. While you're developing the code, this is fairly irrelevant since the plan would be discarded when the view changed, but once it's live, go ahead.

    Tim Wilkinson

    "If it doesn't work in practice, you're using the wrong theory"
    - Immanuel Kant

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply