Optimize WHERE predicate

Question

Post reply

Optimize WHERE predicate

vsamantha35

SSChampion

Points: 11253
More actions
August 23, 2017 at 1:20 pm

#332305

Hi All,

Help required in re-writing below WHERE predicate so that an index on date column can be utilized. Currently a lot of functions are being used in the condition to get the starting day of the month and ending day of the month.

How can I re-write below WHERE condition by not using this function's on the datetime column on which index exists.

SELECT

   ....

   ....

WHERE t1.tran_dt BETWEEN DATEADD(DAY,-DAY(t2.updated_dt) + 1,t2.updated_dt) AND DATEADD(DAY,-1, DATEADD(MONTH,1,DATEADD(DAY,-DAY(t2.updated_dt)+1,t2.updated_dt)))

sample data: This is not real data. This is just for the sake of understanding. However query has to process of 4 million records on prod data.

create table Testtbl
(date_last_updated datetime
)

insert into Testtbl
select getdate()
union all
select getdate()-10
union all
select getdate()-40

select
date_last_updated
,RIGHT( '00' + LTRIM( RTRIM( STR( MONTH( DATEADD( DAY, -1, DATEADD( MONTH, 1, DATEADD( DAY, -DAY( date_last_updated ) + 1, date_last_updated ) ) ))))), 2) + '/'
               + RIGHT('00' + LTRIM( RTRIM( STR( DAY( DATEADD( DAY, -1, DATEADD( MONTH, 1, DATEADD( DAY, -DAY( date_last_updated ) + 1, date_last_updated ) ) ))))), 2) + '/'
               + LTRIM( RTRIM( STR( YEAR( DATEADD( DAY, -1, DATEADD( MONTH, 1, DATEADD( DAY, -DAY( date_last_updated ) + 1, date_last_updated ) ) ))))) AS "Date_Last_Updated in mm/dd/yyyy"
,DATEADD( DAY, -DAY( date_last_updated ) + 1, date_last_updated ) as w1_starting_dt_of_the_month
,DATEADD( DAY, -1, DATEADD( MONTH, 1, DATEADD( DAY, -DAY( date_last_updated ) + 1, date_last_updated ) ) ) as w2_ending_dt_of_the_month
from Testtbl
where date_last_updated between DATEADD( DAY, -DAY( date_last_updated ) + 1, date_last_updated)
and
DATEADD( DAY, -1, DATEADD( MONTH, 1, DATEADD( DAY, -DAY( date_last_updated ) + 1, date_last_updated) ) )

Thanks,

Sam

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply

ChrisM@home SSC-Insane Points: 24260 More actions · Answer 1

vsamantha35 - Wednesday, August 23, 2017 1:20 PM

Hi All,

Help required in re-writing below WHERE predicate so that an index on date column can be utilized. Currently a lot of functions are being used in the condition to get the starting day of the month and ending day of the month.

How can I re-write below WHERE condition by not using this function's on the datetime column on which index exists.

SELECT

   ....

   ....

WHERE t1.tran_dt BETWEEN DATEADD(DAY,-DAY(t2.updated_dt) + 1,t2.updated_dt) AND DATEADD(DAY,-1, DATEADD(MONTH,1,DATEADD(DAY,-DAY(t2.updated_dt)+1,t2.updated_dt)))

sample data: This is not real data. This is just for the sake of understanding. However query has to process of 4 million records on prod data.

create table Testtbl
(date_last_updated datetime
)

insert into Testtbl
select getdate()
union all
select getdate()-10
union all
select getdate()-40

select
date_last_updated
,RIGHT( '00' + LTRIM( RTRIM( STR( MONTH( DATEADD( DAY, -1, DATEADD( MONTH, 1, DATEADD( DAY, -DAY( date_last_updated ) + 1, date_last_updated ) ) ))))), 2) + '/'
               + RIGHT('00' + LTRIM( RTRIM( STR( DAY( DATEADD( DAY, -1, DATEADD( MONTH, 1, DATEADD( DAY, -DAY( date_last_updated ) + 1, date_last_updated ) ) ))))), 2) + '/'
               + LTRIM( RTRIM( STR( YEAR( DATEADD( DAY, -1, DATEADD( MONTH, 1, DATEADD( DAY, -DAY( date_last_updated ) + 1, date_last_updated ) ) ))))) AS "Date_Last_Updated in mm/dd/yyyy"
,DATEADD( DAY, -DAY( date_last_updated ) + 1, date_last_updated ) as w1_starting_dt_of_the_month
,DATEADD( DAY, -1, DATEADD( MONTH, 1, DATEADD( DAY, -DAY( date_last_updated ) + 1, date_last_updated ) ) ) as w2_ending_dt_of_the_month
from Testtbl
where date_last_updated between DATEADD( DAY, -DAY( date_last_updated ) + 1, date_last_updated)
and
DATEADD( DAY, -1, DATEADD( MONTH, 1, DATEADD( DAY, -DAY( date_last_updated ) + 1, date_last_updated) ) )

Thanks,

Sam

Sam, can you post an actual execution plan from the query please? This will provide a wealth of information to folks who wish to participate. Thanks.

[font="Arial"]^{Low-hanging fruit picker and defender of the moggies}[/font]

For better assistance in answering your questions, please read this[/url].

Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

vsamantha35 SSChampion Points: 11253 More actions · Answer 2

vsamantha35

SSChampion

Points: 11253

August 23, 2017 at 1:55 pm

#1956489

Attaching Actual Execution plan executed from SQL Sentry plan explorer.

Alan Burstein SSC Guru Points: 61136 More actions · Answer 3

vsamantha35 - Wednesday, August 23, 2017 1:55 PM

Attaching Actual Execution plan executed from SQL Sentry plan explorer.

For those of us without SQL Sentry explorer can you post the actual execution plan from SSMS? E.g. execute the query in SSMS with "Include Actual Execution plan" turned on then right-click on the plan, save it then post it.

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

Chris Harshman SSC-Forever Points: 42179 More actions · Answer 4

What are the indexes on the table DDA_TRAN? It looks like it might make more sense to have an index that starts with the 2 join columns, with maybe the date being the third column, so (INSTITUTION_NUMBER, DDA_ACCOUNT, TRAN_DATE).

Similarly on DDA_DESC, (INSTITUTION_NUMBER, DDA_ACCOUNT, DESCRIPTION_LINK).

A couple other things, the expression:
,RIGHT ('00' + LTRIM(RTRIM(STR(MONTH("DDA_TRAN"."POSTED_DATE")))), 2) + '/' + RIGHT('00' + LTRIM(RTRIM(STR(DAY("DDA_TRAN"."POSTED_DATE")))), 2) + '/' + LTRIM(RTRIM(STR(YEAR("DDA_TRAN"."POSTED_DATE"))))

can probably be replaced by the much simpler:
,CONVERT(varchar(20), "DDA_TRAN"."POSTED_DATE", 101)

And much of the processing seems to be to support the very complex GROUP BY clause, can that be simplified?

Grant Fritchey SSC Guru Points: 397721 More actions · Answer 5

Pretty much this, DATEADD, on the columns in your predicate is going to lead to table scans which will absolutely lead to poor performance overall. You'd be better off with a calculated column if you need to manipulate those dates. You can index that for use with this query and not need to do the function on the columns. As long as the function is there, performance is pretty likely to stink.

----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

Jason A. Long SSC-Insane Points: 23731 More actions · Answer 6

Jason A. Long

SSC-Insane

Points: 23731

August 23, 2017 at 5:36 pm

#1956522

Your WHERE clause isn't doing anything at all. It's literally checking to see if the date_last_updated is in the month that it's in.
Any idea what the calculated range is supposed to be?

Never mind... I was looking at the last WHERE clause, not the 1st one...

vsamantha35 SSChampion Points: 11253 More actions · Answer 7

Grant Fritchey - Wednesday, August 23, 2017 3:35 PM

Pretty much this, DATEADD, on the columns in your predicate is going to lead to table scans which will absolutely lead to poor performance overall. You'd be better off with a calculated column if you need to manipulate those dates. You can index that for use with this query and not need to do the function on the columns. As long as the function is there, performance is pretty likely to stink.

Hi Grant, Can you please provide more suggestions from the plan. This is one such queries which are generated from 3rd party tool COGNOS and we are like looking for oppurtunities to optimize it.

vsamantha35 SSChampion Points: 11253 More actions · Answer 8

Chris Harshman - Wednesday, August 23, 2017 3:20 PM

What are the indexes on the table DDA_TRAN? It looks like it might make more sense to have an index that starts with the 2 join columns, with maybe the date being the third column, so (INSTITUTION_NUMBER, DDA_ACCOUNT, TRAN_DATE).

Similarly on DDA_DESC, (INSTITUTION_NUMBER, DDA_ACCOUNT, DESCRIPTION_LINK).

A couple other things, the expression:
,RIGHT ('00' + LTRIM(RTRIM(STR(MONTH("DDA_TRAN"."POSTED_DATE")))), 2) + '/' + RIGHT('00' + LTRIM(RTRIM(STR(DAY("DDA_TRAN"."POSTED_DATE")))), 2) + '/' + LTRIM(RTRIM(STR(YEAR("DDA_TRAN"."POSTED_DATE"))))

can probably be replaced by the much simpler:
,CONVERT(varchar(20), "DDA_TRAN"."POSTED_DATE", 101)

And much of the processing seems to be to support the very complex GROUP BY clause, can that be simplified?

Thanks Chris for the help.

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 9

You could do with clustered indexes on those heap tables, something like

DDA_DESC (INSTITUTION_NUMBER, DDA_ACCOUNT, DESCRIPTION_LINK)

DDA_TRAN (INSTITUTION_NUMBER, DDA_ACCOUNT

)

Simplification of the query:

SELECT

"DDA_TRAN"."INSTITUTION_NUMBER" AS "Institution_Number"

,"DDA_TRAN"."DDA_ACCOUNT" AS "DDA_Account"

,"DDA_TRAN"."CHECK_NUMBER" AS "Check_Number"

,x.DebitOrCredit AS "Debit_or_Credit"

,"DDA_TRAN"."DESCRIPTION_CODE" AS "Description_Code"

,"DDA_TRAN"."DESCRIPTION_LINK" AS "Description_Link"

,"DDA_DESC"."DESCRIPTION1" AS "Description1"

,"DDA_DESC"."DESCRIPTION2" AS "Description2"

,"DDA_TRAN"."DOCUMENT_IMAGE_FLAG" AS "Document_Image_Flag"

,"DDA_TRAN"."EXTERNAL_TRAN_CODE" AS "External_Transaction_Code"

,"DDA_TRAN"."INTERNAL_TRAN_CODE" AS "Internal_Transaction_Code"

,CONVERT(VARCHAR(10),DDA_TRAN.POSTED_DATE,101)

,"DDA_TRAN"."REVERSAL_CODE" AS "Reversal_Code"

,SUM("DDA_TRAN"."TRAN_AMOUNT") AS "Transaction_Amount"

,SUM(CASE WHEN x.DebitOrCredit = 'Credit' THEN "DDA_TRAN"."TRAN_AMOUNT" ELSE 0 END ) AS "Credit_Tran_Amount"

,SUM(CASE WHEN x.DebitOrCredit = 'Debit' THEN "DDA_TRAN"."TRAN_AMOUNT" ELSE 0 END ) AS "Debit_Tran_Amount"

,CONVERT(VARCHAR(10),DDA_TRAN.TRAN_DATE,101) AS "Transaction_Date"

,"DDA_TRAN"."TRAN_USED_CODE" AS "Transaction_Used_Code"

,"DDA_DESC"."ACH_COMPANY_ID" AS "ACH_Company_Id"

,"DDA_DESC"."ACH_STANDARD_ENTRY_CLASS" AS "ACH_Standard_Entry_Class"

,"DDA_DESC"."ITEM_ADDRESS" AS "Item_Address"

,CONVERT(VARCHAR(10),EOMONTH(DATE_LAST_UPDATED),101) AS "Date_Last_Updated"

FROM ( "DDA_ACCT" "DDA_ACCT"

LEFT OUTER JOIN "DDA_TRAN" "DDA_TRAN" -- left outer join is converted into an inner join by the WHERE clause

ON "DDA_ACCT"."INSTITUTION_NUMBER" = "DDA_TRAN"."INSTITUTION_NUMBER"

AND "DDA_ACCT"."DDA_ACCOUNT" = "DDA_TRAN"."DDA_ACCOUNT"

)

LEFT OUTER JOIN "DDA_DESC" "DDA_DESC"

ON "DDA_TRAN"."INSTITUTION_NUMBER" = "DDA_DESC"."INSTITUTION_NUMBER"

AND "DDA_TRAN"."DDA_ACCOUNT" = "DDA_DESC"."DDA_ACCOUNT"

AND "DDA_TRAN"."DESCRIPTION_LINK" = "DDA_DESC"."DESCRIPTION_LINK"

CROSS APPLY (

SELECT DebitOrCredit = CASE

WHEN "DDA_TRAN"."INTERNAL_TRAN_CODE" IN (6, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 60, 62, 72, 76, 78, 80, 87, 25, 26, 27, 31, 32, 75)

THEN 'Credit' ELSE 'Debit' END

) x

WHERE "DDA_TRAN"."TRAN_DATE" BETWEEN

DATEADD( DAY, -DAY( "DDA_ACCT"."DATE_LAST_UPDATED" ) + 1, "DDA_ACCT"."DATE_LAST_UPDATED" )

AND

EOMONTH(DDA_ACCT.DATE_LAST_UPDATED)

GROUP BY "DDA_TRAN"."INSTITUTION_NUMBER"

, "DDA_TRAN"."DDA_ACCOUNT"

, "DDA_TRAN"."CHECK_NUMBER"

, x.DebitOrCredit

, "DDA_TRAN"."DESCRIPTION_CODE"

, "DDA_TRAN"."DESCRIPTION_LINK"

, "DDA_DESC"."DESCRIPTION1"

, "DDA_DESC"."DESCRIPTION2"

, "DDA_TRAN"."DOCUMENT_IMAGE_FLAG"

, "DDA_TRAN"."EXTERNAL_TRAN_CODE"

, "DDA_TRAN"."INTERNAL_TRAN_CODE"

, CONVERT(VARCHAR(10),DDA_TRAN.POSTED_DATE,101)

, "DDA_TRAN"."REVERSAL_CODE"

, CONVERT(VARCHAR(10),DDA_TRAN.TRAN_DATE,101)

, "DDA_TRAN"."TRAN_USED_CODE"

, "DDA_DESC"."ACH_COMPANY_ID"

, "DDA_DESC"."ACH_STANDARD_ENTRY_CLASS"

, "DDA_DESC"."ITEM_ADDRESS"

,CONVERT(VARCHAR(10),EOMONTH(DATE_LAST_UPDATED),101)

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Grant Fritchey SSC Guru Points: 397721 More actions · Answer 10

vsamantha35 - Wednesday, August 23, 2017 9:23 PM

Grant Fritchey - Wednesday, August 23, 2017 3:35 PM

Pretty much this, DATEADD, on the columns in your predicate is going to lead to table scans which will absolutely lead to poor performance overall. You'd be better off with a calculated column if you need to manipulate those dates. You can index that for use with this query and not need to do the function on the columns. As long as the function is there, performance is pretty likely to stink.

Hi Grant, Can you please provide more suggestions from the plan. This is one such queries which are generated from 3rd party tool COGNOS and we are like looking for oppurtunities to optimize it.

The issue is that calculations of any kind performed on columns means that in order to satisfy the query, the calculation must be performed on every single row in the table in order to determine if that row matches the calculated value. So if you have 10 rows, you have to run the calculation 10 times. If you have 10 million lakhs of rows, you have to run a whole bunch of calculations. That's going to be very slow and there is absolutely no changing it unless you change how things are being done.

First option, look at what's going on in Cognos. Why is it making that calculation? Could you do something else in Cognos so that the calculation doesn't occur. I'm not a Cognos expert, so you'll have to go elsewhere for definitive answers there.

Second option, instead of having Cognos run calculations against the date time column to get the two values of last_update_date and one month from last_update_date (is that right? whatever it is), create two additional columns on your table that perform these functions on the data, computed columns. Because you're storing the calculations within the table and you can create indexes on computed columns, your query can take advantage of an index which should make it run faster.

----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

Chris Harshman SSC-Forever Points: 42179 More actions · Answer 11

Grant Fritchey - Thursday, August 24, 2017 6:07 AM

First option, look at what's going on in Cognos. Why is it making that calculation? Could you do something else in Cognos so that the calculation doesn't occur. I'm not a Cognos expert, so you'll have to go elsewhere for definitive answers there.

Second option, instead of having Cognos run calculations against the date time column to get the two values of last_update_date and one month from last_update_date (is that right? whatever it is), create two additional columns on your table that perform these functions on the data, computed columns. Because you're storing the calculations within the table and you can create indexes on computed columns, your query can take advantage of an index which should make it run faster.

Another option could be to create some kind of calendar table that has 1 row for each day and some kind of YearMonth integer column. You could then join to this calendar table and compare the 2 YearMonths to see if they match.

Grant Fritchey SSC Guru Points: 397721 More actions · Answer 12

Chris Harshman - Thursday, August 24, 2017 7:39 AM

Grant Fritchey - Thursday, August 24, 2017 6:07 AM

First option, look at what's going on in Cognos. Why is it making that calculation? Could you do something else in Cognos so that the calculation doesn't occur. I'm not a Cognos expert, so you'll have to go elsewhere for definitive answers there.

Second option, instead of having Cognos run calculations against the date time column to get the two values of last_update_date and one month from last_update_date (is that right? whatever it is), create two additional columns on your table that perform these functions on the data, computed columns. Because you're storing the calculations within the table and you can create indexes on computed columns, your query can take advantage of an index which should make it run faster.

Another option could be to create some kind of calendar table that has 1 row for each day and some kind of YearMonth integer column. You could then join to this calendar table and compare the 2 YearMonths to see if they match.

Absolutely. Great idea. I'm aware of this approach, but I keep forgetting about it.

----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

vsamantha35 SSChampion Points: 11253 More actions · Answer 13

Hi Chris,
I tried to create those covering indexes but i didnt see much improvement in query performance . it was 5-10 sec difference. Sometimes the query without indices performed well. so decided not to create those 2 indexes. Thanks for the query re-write. I ll try to see if thats works well and returning the same result in terms of rowcount and data. Thank you for taking your valuable time out to help people like me who wanted to learn some tips in regards with query tuning and approach towards it. Thanks a lot.

vsamantha35 SSChampion Points: 11253 More actions · Answer 14

Grant Fritchey - Thursday, August 24, 2017 6:07 AM

vsamantha35 - Wednesday, August 23, 2017 9:23 PM

Grant Fritchey - Wednesday, August 23, 2017 3:35 PM

Pretty much this, DATEADD, on the columns in your predicate is going to lead to table scans which will absolutely lead to poor performance overall. You'd be better off with a calculated column if you need to manipulate those dates. You can index that for use with this query and not need to do the function on the columns. As long as the function is there, performance is pretty likely to stink.

Hi Grant, Can you please provide more suggestions from the plan. This is one such queries which are generated from 3rd party tool COGNOS and we are like looking for oppurtunities to optimize it.

The issue is that calculations of any kind performed on columns means that in order to satisfy the query, the calculation must be performed on every single row in the table in order to determine if that row matches the calculated value. So if you have 10 rows, you have to run the calculation 10 times. If you have 10 million lakhs of rows, you have to run a whole bunch of calculations. That's going to be very slow and there is absolutely no changing it unless you change how things are being done.

First option, look at what's going on in Cognos. Why is it making that calculation? Could you do something else in Cognos so that the calculation doesn't occur. I'm not a Cognos expert, so you'll have to go elsewhere for definitive answers there.

Second option, instead of having Cognos run calculations against the date time column to get the two values of last_update_date and one month from last_update_date (is that right? whatever it is), create two additional columns on your table that perform these functions on the data, computed columns. Because you're storing the calculations within the table and you can create indexes on computed columns, your query can take advantage of an index which should make it run faster.

we don't have idea regarding who is owning the Cognos piece. As a vendors we do managed services and therefore they give us bunch of queries to tune on daily or weekly basis. We don't have proper documentation for these reports nor the full time employees have full info on these. The project has been take over from 1 client to another and we work as a vendor team for the new client.