Optimize WHERE predicate

  • Hi All,

    Help required in re-writing below WHERE predicate so that an index on date column can be utilized. Currently a lot of functions are being used in the condition to get the starting day of the month and ending day of the month.

    How can I re-write below WHERE condition by not using this function's on the datetime column on which index exists.

    SELECT

       ....

       ....

    WHERE t1.tran_dt BETWEEN DATEADD(DAY,-DAY(t2.updated_dt) + 1,t2.updated_dt) AND DATEADD(DAY,-1, DATEADD(MONTH,1,DATEADD(DAY,-DAY(t2.updated_dt)+1,t2.updated_dt)))

    sample data: This is not real data. This is just for the sake of understanding. However query has to process of 4 million records on prod data.

    create table Testtbl
    (date_last_updated datetime
    )

    insert into Testtbl
    select getdate()
    union all
    select getdate()-10
    union all
    select getdate()-40

    select
    date_last_updated
    ,RIGHT( '00' + LTRIM( RTRIM( STR( MONTH( DATEADD( DAY, -1, DATEADD( MONTH, 1, DATEADD( DAY, -DAY( date_last_updated ) + 1, date_last_updated ) ) ))))), 2) + '/'
                    + RIGHT('00' + LTRIM( RTRIM( STR( DAY( DATEADD( DAY, -1, DATEADD( MONTH, 1, DATEADD( DAY, -DAY( date_last_updated ) + 1, date_last_updated ) ) ))))), 2) + '/'
                    + LTRIM( RTRIM( STR( YEAR( DATEADD( DAY, -1, DATEADD( MONTH, 1, DATEADD( DAY, -DAY( date_last_updated ) + 1, date_last_updated ) ) ))))) AS "Date_Last_Updated in mm/dd/yyyy"
    ,DATEADD( DAY, -DAY( date_last_updated ) + 1, date_last_updated ) as w1_starting_dt_of_the_month
    ,DATEADD( DAY, -1, DATEADD( MONTH, 1, DATEADD( DAY, -DAY( date_last_updated ) + 1, date_last_updated ) ) ) as w2_ending_dt_of_the_month
    from Testtbl
    where date_last_updated between DATEADD( DAY, -DAY( date_last_updated ) + 1, date_last_updated)
    and
    DATEADD( DAY, -1, DATEADD( MONTH, 1, DATEADD( DAY, -DAY( date_last_updated ) + 1, date_last_updated) ) )

    Thanks,

    Sam

  • vsamantha35 - Wednesday, August 23, 2017 1:20 PM

    Hi All,

    Help required in re-writing below WHERE predicate so that an index on date column can be utilized. Currently a lot of functions are being used in the condition to get the starting day of the month and ending day of the month.

    How can I re-write below WHERE condition by not using this function's on the datetime column on which index exists.

    SELECT

       ....

       ....

    WHERE t1.tran_dt BETWEEN DATEADD(DAY,-DAY(t2.updated_dt) + 1,t2.updated_dt) AND DATEADD(DAY,-1, DATEADD(MONTH,1,DATEADD(DAY,-DAY(t2.updated_dt)+1,t2.updated_dt)))

    sample data: This is not real data. This is just for the sake of understanding. However query has to process of 4 million records on prod data.

    create table Testtbl
    (date_last_updated datetime
    )

    insert into Testtbl
    select getdate()
    union all
    select getdate()-10
    union all
    select getdate()-40

    select
    date_last_updated
    ,RIGHT( '00' + LTRIM( RTRIM( STR( MONTH( DATEADD( DAY, -1, DATEADD( MONTH, 1, DATEADD( DAY, -DAY( date_last_updated ) + 1, date_last_updated ) ) ))))), 2) + '/'
                    + RIGHT('00' + LTRIM( RTRIM( STR( DAY( DATEADD( DAY, -1, DATEADD( MONTH, 1, DATEADD( DAY, -DAY( date_last_updated ) + 1, date_last_updated ) ) ))))), 2) + '/'
                    + LTRIM( RTRIM( STR( YEAR( DATEADD( DAY, -1, DATEADD( MONTH, 1, DATEADD( DAY, -DAY( date_last_updated ) + 1, date_last_updated ) ) ))))) AS "Date_Last_Updated in mm/dd/yyyy"
    ,DATEADD( DAY, -DAY( date_last_updated ) + 1, date_last_updated ) as w1_starting_dt_of_the_month
    ,DATEADD( DAY, -1, DATEADD( MONTH, 1, DATEADD( DAY, -DAY( date_last_updated ) + 1, date_last_updated ) ) ) as w2_ending_dt_of_the_month
    from Testtbl
    where date_last_updated between DATEADD( DAY, -DAY( date_last_updated ) + 1, date_last_updated)
    and
    DATEADD( DAY, -1, DATEADD( MONTH, 1, DATEADD( DAY, -DAY( date_last_updated ) + 1, date_last_updated) ) )

    Thanks,

    Sam

    Sam, can you post an actual execution plan from the query please? This will provide a wealth of information to folks who wish to participate. Thanks.


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • Attaching Actual Execution plan executed from SQL Sentry plan explorer.

  • vsamantha35 - Wednesday, August 23, 2017 1:55 PM

    Attaching Actual Execution plan executed from SQL Sentry plan explorer.

    For those of us without SQL Sentry explorer can you post the actual execution plan from SSMS? E.g. execute the query in SSMS with "Include Actual Execution plan" turned on then right-click on the plan, save it then post it.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • What are the indexes on the table DDA_TRAN?  It looks like it might make more sense to have an index that starts with the 2 join columns, with maybe the date being the third column, so (INSTITUTION_NUMBER, DDA_ACCOUNT, TRAN_DATE). 

    Similarly on DDA_DESC, (INSTITUTION_NUMBER, DDA_ACCOUNT, DESCRIPTION_LINK).

    A couple other things, the expression:
    ,RIGHT ('00' + LTRIM(RTRIM(STR(MONTH("DDA_TRAN"."POSTED_DATE")))), 2) + '/'
        + RIGHT('00' + LTRIM(RTRIM(STR(DAY("DDA_TRAN"."POSTED_DATE")))), 2) + '/'
        + LTRIM(RTRIM(STR(YEAR("DDA_TRAN"."POSTED_DATE"))))

    can probably be replaced by the much simpler:
    ,CONVERT(varchar(20), "DDA_TRAN"."POSTED_DATE", 101)

    And much of the processing seems to be to support the very complex GROUP BY clause, can that be simplified?

  • Pretty much this, DATEADD, on the columns in your predicate is going to lead to table scans which will absolutely lead to poor performance overall. You'd be better off with a calculated column if you need to manipulate those dates. You can index that for use with this query and not need to do the function on the columns. As long as the function is there, performance is pretty likely to stink.

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

  • Your WHERE clause isn't doing anything at all. It's literally checking to see if the date_last_updated is in the month that it's in.
    Any idea what the calculated range is supposed to be?

    Never mind... I was looking at the last WHERE clause, not the 1st one...

  • Grant Fritchey - Wednesday, August 23, 2017 3:35 PM

    Pretty much this, DATEADD, on the columns in your predicate is going to lead to table scans which will absolutely lead to poor performance overall. You'd be better off with a calculated column if you need to manipulate those dates. You can index that for use with this query and not need to do the function on the columns. As long as the function is there, performance is pretty likely to stink.

    Hi Grant, Can you please provide more suggestions from the plan. This is one such queries which are generated from 3rd party tool COGNOS and we are like looking for oppurtunities to optimize it.

  • Chris Harshman - Wednesday, August 23, 2017 3:20 PM

    What are the indexes on the table DDA_TRAN?  It looks like it might make more sense to have an index that starts with the 2 join columns, with maybe the date being the third column, so (INSTITUTION_NUMBER, DDA_ACCOUNT, TRAN_DATE). 

    Similarly on DDA_DESC, (INSTITUTION_NUMBER, DDA_ACCOUNT, DESCRIPTION_LINK).

    A couple other things, the expression:
    ,RIGHT ('00' + LTRIM(RTRIM(STR(MONTH("DDA_TRAN"."POSTED_DATE")))), 2) + '/'
        + RIGHT('00' + LTRIM(RTRIM(STR(DAY("DDA_TRAN"."POSTED_DATE")))), 2) + '/'
        + LTRIM(RTRIM(STR(YEAR("DDA_TRAN"."POSTED_DATE"))))

    can probably be replaced by the much simpler:
    ,CONVERT(varchar(20), "DDA_TRAN"."POSTED_DATE", 101)

    And much of the processing seems to be to support the very complex GROUP BY clause, can that be simplified?

    Thanks Chris for the help.

  • You could do with clustered indexes on those heap tables, something like

    DDA_DESC (INSTITUTION_NUMBER, DDA_ACCOUNT, DESCRIPTION_LINK)

    DDA_TRAN (INSTITUTION_NUMBER, DDA_ACCOUNT

    )

    Simplification of the query:

    SELECT

    "DDA_TRAN"."INSTITUTION_NUMBER" AS "Institution_Number"

    ,"DDA_TRAN"."DDA_ACCOUNT" AS "DDA_Account"

    ,"DDA_TRAN"."CHECK_NUMBER" AS "Check_Number"

    ,x.DebitOrCredit AS "Debit_or_Credit"

    ,"DDA_TRAN"."DESCRIPTION_CODE" AS "Description_Code"

    ,"DDA_TRAN"."DESCRIPTION_LINK" AS "Description_Link"

    ,"DDA_DESC"."DESCRIPTION1" AS "Description1"

    ,"DDA_DESC"."DESCRIPTION2" AS "Description2"

    ,"DDA_TRAN"."DOCUMENT_IMAGE_FLAG" AS "Document_Image_Flag"

    ,"DDA_TRAN"."EXTERNAL_TRAN_CODE" AS "External_Transaction_Code"

    ,"DDA_TRAN"."INTERNAL_TRAN_CODE" AS "Internal_Transaction_Code"

    ,CONVERT(VARCHAR(10),DDA_TRAN.POSTED_DATE,101)

    ,"DDA_TRAN"."REVERSAL_CODE" AS "Reversal_Code"

    ,SUM("DDA_TRAN"."TRAN_AMOUNT") AS "Transaction_Amount"

    ,SUM(CASE WHEN x.DebitOrCredit = 'Credit' THEN "DDA_TRAN"."TRAN_AMOUNT" ELSE 0 END ) AS "Credit_Tran_Amount"

    ,SUM(CASE WHEN x.DebitOrCredit = 'Debit' THEN "DDA_TRAN"."TRAN_AMOUNT" ELSE 0 END ) AS "Debit_Tran_Amount"

    ,CONVERT(VARCHAR(10),DDA_TRAN.TRAN_DATE,101) AS "Transaction_Date"

    ,"DDA_TRAN"."TRAN_USED_CODE" AS "Transaction_Used_Code"

    ,"DDA_DESC"."ACH_COMPANY_ID" AS "ACH_Company_Id"

    ,"DDA_DESC"."ACH_STANDARD_ENTRY_CLASS" AS "ACH_Standard_Entry_Class"

    ,"DDA_DESC"."ITEM_ADDRESS" AS "Item_Address"

    ,CONVERT(VARCHAR(10),EOMONTH(DATE_LAST_UPDATED),101) AS "Date_Last_Updated"

    FROM ( "DDA_ACCT" "DDA_ACCT"

    LEFT OUTER JOIN "DDA_TRAN" "DDA_TRAN" -- left outer join is converted into an inner join by the WHERE clause

    ON "DDA_ACCT"."INSTITUTION_NUMBER" = "DDA_TRAN"."INSTITUTION_NUMBER"

    AND "DDA_ACCT"."DDA_ACCOUNT" = "DDA_TRAN"."DDA_ACCOUNT"

    )

    LEFT OUTER JOIN "DDA_DESC" "DDA_DESC"

    ON "DDA_TRAN"."INSTITUTION_NUMBER" = "DDA_DESC"."INSTITUTION_NUMBER"

    AND "DDA_TRAN"."DDA_ACCOUNT" = "DDA_DESC"."DDA_ACCOUNT"

    AND "DDA_TRAN"."DESCRIPTION_LINK" = "DDA_DESC"."DESCRIPTION_LINK"

    CROSS APPLY (

    SELECT DebitOrCredit = CASE

    WHEN "DDA_TRAN"."INTERNAL_TRAN_CODE" IN (6, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 60, 62, 72, 76, 78, 80, 87, 25, 26, 27, 31, 32, 75)

    THEN 'Credit' ELSE 'Debit' END

    ) x

    WHERE "DDA_TRAN"."TRAN_DATE" BETWEEN

    DATEADD( DAY, -DAY( "DDA_ACCT"."DATE_LAST_UPDATED" ) + 1, "DDA_ACCT"."DATE_LAST_UPDATED" )

    AND

    EOMONTH(DDA_ACCT.DATE_LAST_UPDATED)

    GROUP BY "DDA_TRAN"."INSTITUTION_NUMBER"

    , "DDA_TRAN"."DDA_ACCOUNT"

    , "DDA_TRAN"."CHECK_NUMBER"

    , x.DebitOrCredit

    , "DDA_TRAN"."DESCRIPTION_CODE"

    , "DDA_TRAN"."DESCRIPTION_LINK"

    , "DDA_DESC"."DESCRIPTION1"

    , "DDA_DESC"."DESCRIPTION2"

    , "DDA_TRAN"."DOCUMENT_IMAGE_FLAG"

    , "DDA_TRAN"."EXTERNAL_TRAN_CODE"

    , "DDA_TRAN"."INTERNAL_TRAN_CODE"

    , CONVERT(VARCHAR(10),DDA_TRAN.POSTED_DATE,101)

    , "DDA_TRAN"."REVERSAL_CODE"

    , CONVERT(VARCHAR(10),DDA_TRAN.TRAN_DATE,101)

    , "DDA_TRAN"."TRAN_USED_CODE"

    , "DDA_DESC"."ACH_COMPANY_ID"

    , "DDA_DESC"."ACH_STANDARD_ENTRY_CLASS"

    , "DDA_DESC"."ITEM_ADDRESS"

    ,CONVERT(VARCHAR(10),EOMONTH(DATE_LAST_UPDATED),101)

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • vsamantha35 - Wednesday, August 23, 2017 9:23 PM

    Grant Fritchey - Wednesday, August 23, 2017 3:35 PM

    Pretty much this, DATEADD, on the columns in your predicate is going to lead to table scans which will absolutely lead to poor performance overall. You'd be better off with a calculated column if you need to manipulate those dates. You can index that for use with this query and not need to do the function on the columns. As long as the function is there, performance is pretty likely to stink.

    Hi Grant, Can you please provide more suggestions from the plan. This is one such queries which are generated from 3rd party tool COGNOS and we are like looking for oppurtunities to optimize it.

    The issue is that calculations of any kind performed on columns means that in order to satisfy the query, the calculation must be performed on every single row in the table in order to determine if that row matches the calculated value. So if you have 10 rows, you have to run the calculation 10 times. If you have 10 million lakhs of rows, you have to run a whole bunch of calculations. That's going to be very slow and there is absolutely no changing it unless you change how things are being done.

    First option, look at what's going on in Cognos. Why is it making that calculation? Could you do something else in Cognos so that the calculation doesn't occur. I'm not a Cognos expert, so you'll have to go elsewhere for definitive answers there.

    Second option, instead of having Cognos run calculations against the date time column to get the two values of last_update_date and one month from last_update_date (is that right? whatever it is), create two additional columns on your table that perform these functions on the data, computed columns. Because you're storing the calculations within the table and you can create indexes on computed columns, your query can take advantage of an index which should make it run faster.

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

  • Grant Fritchey - Thursday, August 24, 2017 6:07 AM

    First option, look at what's going on in Cognos. Why is it making that calculation? Could you do something else in Cognos so that the calculation doesn't occur. I'm not a Cognos expert, so you'll have to go elsewhere for definitive answers there.

    Second option, instead of having Cognos run calculations against the date time column to get the two values of last_update_date and one month from last_update_date (is that right? whatever it is), create two additional columns on your table that perform these functions on the data, computed columns. Because you're storing the calculations within the table and you can create indexes on computed columns, your query can take advantage of an index which should make it run faster.

    Another option could be to create some kind of calendar table that has 1 row for each day and some kind of YearMonth integer column.  You could then join to this calendar table and compare the 2 YearMonths to see if they match.

  • Chris Harshman - Thursday, August 24, 2017 7:39 AM

    Grant Fritchey - Thursday, August 24, 2017 6:07 AM

    First option, look at what's going on in Cognos. Why is it making that calculation? Could you do something else in Cognos so that the calculation doesn't occur. I'm not a Cognos expert, so you'll have to go elsewhere for definitive answers there.

    Second option, instead of having Cognos run calculations against the date time column to get the two values of last_update_date and one month from last_update_date (is that right? whatever it is), create two additional columns on your table that perform these functions on the data, computed columns. Because you're storing the calculations within the table and you can create indexes on computed columns, your query can take advantage of an index which should make it run faster.

    Another option could be to create some kind of calendar table that has 1 row for each day and some kind of YearMonth integer column.  You could then join to this calendar table and compare the 2 YearMonths to see if they match.

    Absolutely. Great idea. I'm aware of this approach, but I keep forgetting about it.

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

  • Hi Chris,
    I tried  to create  those covering indexes  but i  didnt see much  improvement in query performance .  it was 5-10 sec difference. Sometimes the query without indices performed well. so decided not to create those 2 indexes. Thanks for the query re-write. I ll try to see if thats works well and returning the same result in terms of rowcount and data. Thank you for taking your valuable time out to help people like me who wanted to learn some tips in regards with query tuning and approach towards it. Thanks a lot.

  • Grant Fritchey - Thursday, August 24, 2017 6:07 AM

    vsamantha35 - Wednesday, August 23, 2017 9:23 PM

    Grant Fritchey - Wednesday, August 23, 2017 3:35 PM

    Pretty much this, DATEADD, on the columns in your predicate is going to lead to table scans which will absolutely lead to poor performance overall. You'd be better off with a calculated column if you need to manipulate those dates. You can index that for use with this query and not need to do the function on the columns. As long as the function is there, performance is pretty likely to stink.

    Hi Grant, Can you please provide more suggestions from the plan. This is one such queries which are generated from 3rd party tool COGNOS and we are like looking for oppurtunities to optimize it.

    The issue is that calculations of any kind performed on columns means that in order to satisfy the query, the calculation must be performed on every single row in the table in order to determine if that row matches the calculated value. So if you have 10 rows, you have to run the calculation 10 times. If you have 10 million lakhs of rows, you have to run a whole bunch of calculations. That's going to be very slow and there is absolutely no changing it unless you change how things are being done.

    First option, look at what's going on in Cognos. Why is it making that calculation? Could you do something else in Cognos so that the calculation doesn't occur. I'm not a Cognos expert, so you'll have to go elsewhere for definitive answers there.

    Second option, instead of having Cognos run calculations against the date time column to get the two values of last_update_date and one month from last_update_date (is that right? whatever it is), create two additional columns on your table that perform these functions on the data, computed columns. Because you're storing the calculations within the table and you can create indexes on computed columns, your query can take advantage of an index which should make it run faster.

    we don't have idea regarding who is owning the Cognos piece. As a vendors we do managed services and therefore they give us bunch of queries to tune on daily or weekly basis. We don't have proper documentation for these reports nor the full time employees have full info on these. The project has been take over from 1 client to another and we work as a vendor team for the new client.

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply