January 29, 2021 at 4:00 pm
Hi all,
I have one "heavy" query on my DB. It executes 2400 times per day and I don't know how to optimize it.
Do you have any idea?
Thanks.
DECLARE @time datetime,
@rowcount int 2400
select @time = dateadd(-HH, 24, GETUTCDATE())
WHILE (@rowcount=2400)
BEGIN
;WITH CTE_1 AS
(SELECT TOP 2400 *
FROM dbo.Table1
WHERE MyDate < @time)
DELETE FROM CTE_1
SELECT @rowcount = @@rowcount
WAITFOR DELAY = '00:00:01';
END
January 29, 2021 at 4:49 pm
I would run the query with 'SET Statistics IO ON' and run the query with the actual execution plan. Chances are the execution plan will give you a missing index recommendation on the MyDate column to create. I would also look at the Messages for how many logical reads (pages) you are getting from the Table1. The key to speed is to reduce logical reads.
Another tip I learned this week is using, http://www.firstreponderkit.org or brentozar.com, to get the sp_blitzfirst script and run:
sp_blitzfirst @seconds=60, @expertmode=1
The above command gets performance data start, wait for 60 seconds, and get performance data again. This allowed me to run another query during that time and let me know in my case I had a wait of Async IO which was network related so my 5 second query spent 4 seconds feeding the result sets from SQL server VM through the cocktail straw of my workstation.
January 29, 2021 at 5:06 pm
The problem is the loop and the fact that this runs 2400 times per day. It appears the goal is to delete any rows older than 24 hours (your DATEADD is incorrect here - which I am assuming is just a bad copy/paste).
I would not set this up to delete based on a specific hour - I would round to the previous day instead.
Declare @rowsAffected int = 1
, @batchSize int = 2400
, @time datetime = dateadd(day, datediff(day, 0, getutcdate()) - 1, 0);
While @rowsAffected > 0
Begin
Delete Top (@batchSize)
From dbo.Table1
Where MyDate < @time;
Set @rowsAffected = @@rowcount;
Waitfor Delay = '00:00:01';
End;
This will affect any rows prior to yesterday at midnight (ex: '2021-01-28 00:00:00.000') UTC.
To optimize this, you need to have an index on MyDate - but it may not be used, depending on how much data exists in the table vs how many rows meet the requirement. Another option would be a modification by adding a loop around the above based on the day and deleting in batches for each day until you get to the last day to be deleted.
Something like this...definitely test and validate prior to using:
Declare @rowsAffected int = 1
, @batchSize int = 2400
, @latestDate datetime = dateadd(day, datediff(day, 0, getutcdate()) - 1, 0)
, @oldestDate datetime = coalesce((Select min(MyDate) From dbo.Table1), '2020-01-01');
Declare @deleteDate datetime = dateadd(day, datediff(day, 0, @oldestDate) + 1, 0);
While @deleteDate < @latestDate
Begin
While @rowsAffected > 0
Begin
Delete Top (@batchSize)
From dbo.Table1
Where MyDate < @deleteDate;
Set @rowsAffected = @@rowcount;
Waitfor Delay = '00:00:01';
End
Set @deleteDate = dateadd(day, 1, @deleteDate);
End;
Jeffrey Williams
Problems are opportunities brilliantly disguised as insurmountable obstacles.
How to post questions to get better answers faster
Managing Transaction Logs
January 30, 2021 at 3:40 am
Be sure to cluster the table on ( MyDate, $IDENTITY ). The $IDENTITY just insures that the key is unique. That will make the deletes very efficient.
SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!
January 30, 2021 at 4:25 pm
Ah... be careful on the Clustered Index recommendation. While such a recommendation can help a lot of things, especially this particular problem, changing the Clustered Index without understanding the impact on other queries could lead to devastation in many other areas.
And, no... I'm not against this change. In fact, it's usually a great recommendation. You can't just assume its usefulness nor its impact based on a single query/delete. "Must look eye".
--Jeff Moden
January 30, 2021 at 4:34 pm
Hi all, I have one "heavy" query on my DB. It executes 2400 times per day and I don't know how to optimize it. Do you have any idea? Thanks.
DECLARE @time datetime, @rowcount int 2400
select @time = dateadd(-HH, 24, GETUTCDATE())
WHILE (@rowcount=2400)
BEGIN
;WITH CTE_1 AS
(SELECT TOP 2400 *
FROM dbo.Table1
WHERE MyDate < @time)
DELETE FROM CTE_1
SELECT @rowcount = @@rowcount
WAITFOR DELAY = '00:00:01';
END
I don't want to assume because, a lot of times, code doesn't actually match the requirement to begin with. What is the actual requirement here and what is causing this code to execute? I ask the latter because code that runs 2400 times each day averages out to 1 run every 36 seconds, which is a bit of an oddity to being with.
Also, how many rows are generally accumulated in this table per hour? It's important to know that because it may actually be a bit silly to do a loop to delete by such small batch sizes.
To summarize my questions...
--Jeff Moden
February 8, 2021 at 4:17 pm
regarding the clustered index recommentation:
you do not necessary have to cluster your table by MyTime. When MyTime is ascending too (e.g. the creation timestamp) you may e.g. simply selecting the MAX(id) WHERE MyTime < @time and then DELETE WHERE id < @max_id.
Or you could first select the Clustered index columns into a #temp_table WHERE MyTime < @time and run a DELETE FROM t FROM table_1 as t INNER JOIN (SELECT TOP 2400 id FROM #temp ORDER BY id) as tmp on tmp.id = t.id (followed by a DELETE TOP (2400) FROM #temp), ideally #temp has a clustered index over id in this case
But as always: the best solution depends on your data - what want you to delete, how many records are deleted, how are your indexes, how many rows has the table in total...
Edit: of course you could start implementing table partitioning (on the MyTime column on daily basis) too. In this case you would need to run your job only once per day and remove the previous day data by simply running a
TRUNCATE TABLE xxx WITH (PARTITIONS ($partition.pf_my_time(DATEADD(DAY, -1, GETUTCDATE()))
which would take only a few milliseconds to be executed, regardless how much data / rows you are deleting.
God is real, unless declared integer.
February 8, 2021 at 4:21 pm
Apparently, the OP has left the building.
--Jeff Moden
February 8, 2021 at 5:20 pm
They have a query that runs 2400+ times a day using that column to DELETE from the table -- I stick with my recommendation, cluster the table first by that column, esp. since it's a naturally ascending value as well.
SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!
February 9, 2021 at 8:41 am
@scott: The question, if they should cluster for it or not depends, how else this table is used. Yes, 2,400 deletes per day may be a lot, but when they are running a million SELECTs and / or UPDATEs per day using the current clustered index and returning many columns, it would be stupid to cluster by MyTime just to speed the DELETEs up.
God is real, unless declared integer.
February 9, 2021 at 3:46 pm
Well, obviously, I don't consider it "stupid". Remember, the DELETEs are almost twice a minute. If the table has to be scanned in order to do the deletes, that would cause massive locking of the table, causing all kinds of deadlocks. Or, to prevent the deadlocks, forcing (almost) all reads to all be done WTIH (NOLOCK), which has potential issues as well. The only way to be really sure that you don't get a table scan for the deletes is to cluster the table to match the deletes.
Reads, however, can be done with nonclus indexes, covering ones if necessary.
SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!
February 9, 2021 at 4:12 pm
Like I said above, changing the clustered index to support these DELETEs would certainly help these deletes A LOT. But, changing the clustered index without knowing "the rest of the story" may have a devastating impact on other much more important code.
Unfortunately, it looks like the OP has abandoned this thread and so we simply don't know what "the rest of the story" actually is.
--Jeff Moden
February 9, 2021 at 4:19 pm
But, changing the clustered index without knowing "the rest of the story" may have a devastating impact on other much more important code.
If you build nonclustered indexes as necessary, how so? What typical processing couldn't you handle that way?
SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply