Query optimization

Question

Post reply

Query optimization

christi1711

SSC-Addicted

Points: 437
More actions
January 29, 2021 at 4:00 pm

#3838791
Hi all,

I have one "heavy" query on my DB. It executes 2400 times per day and I don't know how to optimize it.

Do you have any idea?

Thanks.

DECLARE @time datetime,

@rowcount int 2400

select @time = dateadd(-HH, 24, GETUTCDATE())
```
WHILE (@rowcount=2400)
BEGIN
;WITH CTE_1 AS
(SELECT TOP 2400 *
FROM dbo.Table1
WHERE MyDate < @time)
DELETE FROM CTE_1

SELECT @rowcount = @@rowcount
WAITFOR DELAY = '00:00:01';
END
```

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

3rd Normal Form SSC-Addicted Points: 457 More actions · Answer 1

I would run the query with 'SET Statistics IO ON' and run the query with the actual execution plan. Chances are the execution plan will give you a missing index recommendation on the MyDate column to create. I would also look at the Messages for how many logical reads (pages) you are getting from the Table1. The key to speed is to reduce logical reads.

Another tip I learned this week is using, http://www.firstreponderkit.org or brentozar.com, to get the sp_blitzfirst script and run:

sp_blitzfirst @seconds=60, @expertmode=1

The above command gets performance data start, wait for 60 seconds, and get performance data again. This allowed me to run another query during that time and let me know in my case I had a wait of Async IO which was network related so my 5 second query spent 4 seconds feeding the result sets from SQL server VM through the cocktail straw of my workstation.

Jeffrey Williams SSC Guru Points: 89773 More actions · Answer 2

The problem is the loop and the fact that this runs 2400 times per day. It appears the goal is to delete any rows older than 24 hours (your DATEADD is incorrect here - which I am assuming is just a bad copy/paste).

I would not set this up to delete based on a specific hour - I would round to the previous day instead.

Declare @rowsAffected int = 1
      , @batchSize int = 2400
      , @time datetime = dateadd(day, datediff(day, 0, getutcdate()) - 1, 0);

  While @rowsAffected > 0
  Begin

 Delete Top (@batchSize)
   From dbo.Table1
  Where MyDate < @time;

    Set @rowsAffected = @@rowcount;
        
Waitfor Delay = '00:00:01';

    End;

This will affect any rows prior to yesterday at midnight (ex: '2021-01-28 00:00:00.000') UTC.

To optimize this, you need to have an index on MyDate - but it may not be used, depending on how much data exists in the table vs how many rows meet the requirement. Another option would be a modification by adding a loop around the above based on the day and deleting in batches for each day until you get to the last day to be deleted.

Something like this...definitely test and validate prior to using:

Declare @rowsAffected int = 1
      , @batchSize int = 2400
      , @latestDate datetime = dateadd(day, datediff(day, 0, getutcdate()) - 1, 0)
      , @oldestDate datetime = coalesce((Select min(MyDate) From dbo.Table1), '2020-01-01');

Declare @deleteDate datetime = dateadd(day, datediff(day, 0, @oldestDate) + 1, 0);

  While @deleteDate < @latestDate
  Begin 

  While @rowsAffected > 0
  Begin

 Delete Top (@batchSize)
   From dbo.Table1
  Where MyDate < @deleteDate;

    Set @rowsAffected = @@rowcount;
        
Waitfor Delay = '00:00:01';
    End

    Set @deleteDate = dateadd(day, 1, @deleteDate);

    End;

Jeffrey Williams
Problems are opportunities brilliantly disguised as insurmountable obstacles.

How to post questions to get better answers faster
Managing Transaction Logs

ScottPletcher SSC Guru Points: 100262 More actions · Answer 3

ScottPletcher

SSC Guru

Points: 100262

January 30, 2021 at 3:40 am

#3838967

Be sure to cluster the table on ( MyDate, $IDENTITY ). The $IDENTITY just insures that the key is unique. That will make the deletes very efficient.

SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 4

Ah... be careful on the Clustered Index recommendation. While such a recommendation can help a lot of things, especially this particular problem, changing the Clustered Index without understanding the impact on other queries could lead to devastation in many other areas.

And, no... I'm not against this change. In fact, it's usually a great recommendation. You can't just assume its usefulness nor its impact based on a single query/delete. "Must look eye".

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 5

christi1711 wrote:

Hi all, I have one "heavy" query on my DB. It executes 2400 times per day and I don't know how to optimize it. Do you have any idea? Thanks.

DECLARE @time datetime, @rowcount int 2400

select @time = dateadd(-HH, 24, GETUTCDATE())
WHILE (@rowcount=2400)
BEGIN
;WITH CTE_1 AS
(SELECT TOP 2400 *
FROM dbo.Table1
WHERE MyDate < @time)
DELETE FROM CTE_1

SELECT @rowcount = @@rowcount
WAITFOR DELAY = '00:00:01';
END

I don't want to assume because, a lot of times, code doesn't actually match the requirement to begin with. What is the actual requirement here and what is causing this code to execute? I ask the latter because code that runs 2400 times each day averages out to 1 run every 36 seconds, which is a bit of an oddity to being with.

Also, how many rows are generally accumulated in this table per hour? It's important to know that because it may actually be a bit silly to do a loop to delete by such small batch sizes.

To summarize my questions...

Why are you deleting rows from this table to begin with?
What is causing/controlling the code to execute 2400 times each day?
How many rows per hour is this table accumulating?
What is the structure of this table (Post the CREATE TABLE for this table)?
What are the indexes that are currently available on this table (Post the CREATE INDEX for each index including the index for the PK)?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Thomas Franz Hall of Fame Points: 3760 More actions · Answer 6

regarding the clustered index recommentation:

you do not necessary have to cluster your table by MyTime. When MyTime is ascending too (e.g. the creation timestamp) you may e.g. simply selecting the MAX(id) WHERE MyTime < @time and then DELETE WHERE id < @max_id.

Or you could first select the Clustered index columns into a #temp_table WHERE MyTime < @time and run a DELETE FROM t FROM table_1 as t INNER JOIN (SELECT TOP 2400 id FROM #temp ORDER BY id) as tmp on tmp.id = t.id (followed by a DELETE TOP (2400) FROM #temp), ideally #temp has a clustered index over id in this case

But as always: the best solution depends on your data - what want you to delete, how many records are deleted, how are your indexes, how many rows has the table in total...

Edit: of course you could start implementing table partitioning (on the MyTime column on daily basis) too. In this case you would need to run your job only once per day and remove the previous day data by simply running a

TRUNCATE TABLE xxx WITH (PARTITIONS ($partition.pf_my_time(DATEADD(DAY, -1, GETUTCDATE()))

which would take only a few milliseconds to be executed, regardless how much data / rows you are deleting.

This reply was modified 3 years, 7 months ago by Thomas Franz.

God is real, unless declared integer.

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 7

Apparently, the OP has left the building.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

ScottPletcher SSC Guru Points: 100262 More actions · Answer 8

They have a query that runs 2400+ times a day using that column to DELETE from the table -- I stick with my recommendation, cluster the table first by that column, esp. since it's a naturally ascending value as well.

SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

Thomas Franz Hall of Fame Points: 3760 More actions · Answer 9

@scott: The question, if they should cluster for it or not depends, how else this table is used. Yes, 2,400 deletes per day may be a lot, but when they are running a million SELECTs and / or UPDATEs per day using the current clustered index and returning many columns, it would be stupid to cluster by MyTime just to speed the DELETEs up.

God is real, unless declared integer.

ScottPletcher SSC Guru Points: 100262 More actions · Answer 10

Well, obviously, I don't consider it "stupid". Remember, the DELETEs are almost twice a minute. If the table has to be scanned in order to do the deletes, that would cause massive locking of the table, causing all kinds of deadlocks. Or, to prevent the deadlocks, forcing (almost) all reads to all be done WTIH (NOLOCK), which has potential issues as well. The only way to be really sure that you don't get a table scan for the deletes is to cluster the table to match the deletes.

Reads, however, can be done with nonclus indexes, covering ones if necessary.

SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 11

Like I said above, changing the clustered index to support these DELETEs would certainly help these deletes A LOT. But, changing the clustered index without knowing "the rest of the story" may have a devastating impact on other much more important code.

Unfortunately, it looks like the OP has abandoned this thread and so we simply don't know what "the rest of the story" actually is.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

ScottPletcher SSC Guru Points: 100262 More actions · Answer 12

Jeff Moden wrote:

But, changing the clustered index without knowing "the rest of the story" may have a devastating impact on other much more important code.

If you build nonclustered indexes as necessary, how so? What typical processing couldn't you handle that way?

SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!