We have a function that calculates student truancy records by counting their absence in a moving 30 days window. It uses a recursive CTE to count any 30 days passed.
Then we use other table to join this table valued function, and it was extremely slow as records increased.
I tuned it to select the TVF into temp table, then other table can join the temp table, it increased some performance.
But still this is the most cost query in my program, any other things I can do to tune the performance?
CREATE FUNCTION [dbo].[MaxTruantCount_Any30Day] ()
RETURNS TABLE
AS
RETURN
WITH FirstLastAbsentDates AS
(
SELECT a.StudentId
,MIN(a.Att_Date) AS FirstDay
,MAX(a.Att_Date) AS LastDay
FROM TruantDay a
JOIN SchoolYear b
ON a.SchoolYearId = b.Id
AND GETDATE() BETWEEN b.StartDate AND b.EndDate
WHERE a.IsAbsent = 1
AND a.IsCleared = 0
GROUP BY a.StudentId
HAVING MAX(a.Att_Date) > MIN(a.Att_Date)
)
, DateRangeEndDays (StudentId, EndDay) AS
(
SELECT a.StudentId
,a.LastDay
FROM FirstLastAbsentDates a
UNION ALL
SELECT a.StudentId
,DATEADD(d, -1, c.EndDay) AS EndDay
FROM FirstLastAbsentDates a
JOIN DateRangeEndDays c
ON c.StudentId = a.StudentId
WHERE DATEADD(d, -1, c.EndDay) > a.FirstDay
)
, AbsenceCounts AS
(
SELECT a.StudentId
, a.EndDay
, COUNT(*) AS AbsenceCount
FROM DateRangeEndDays a
JOIN TruantDay b
ON b.StudentId = a.StudentId
WHERE b.IsAbsent = 1
AND b.IsCleared = 0
AND b.Att_Date BETWEEN DATEADD(d, -30, a.EndDay)
AND a.EndDay
GROUP BY a.StudentId, a.EndDay
)
, MaxAbsenceCounts AS
(
SELECT a.StudentId
, MAX(a.AbsenceCount) AS MaxAbsenceCount
FROM AbsenceCounts a
GROUP BY a.StudentID
)
, MaxAbsenceDate AS
(
SELECT a.StudentId
, MIN(b.EndDay) AS MaxAbsenceDate
, a.MaxAbsenceCount
FROM MaxAbsenceCounts a
JOIN AbsenceCounts b
ON a.StudentId = b.StudentId
AND a.MaxAbsenceCount = b.AbsenceCount
GROUP BY a.StudentId, a.MaxAbsenceCount
)
SELECT a.StudentId AS StudentId
,a.MaxAbsenceCount AS MaxAbsenceCount
,a.MaxAbsenceDate AS MaxAbsenceDate
FROM MaxAbsenceDate a
--order by a.StudentId
;
March 17, 2021 at 1:32 am
That function can be rewritten to perform a lot better - but to do that we need sample data and tables to test, with expected results based on the sample data.
Jeffrey Williams
Problems are opportunities brilliantly disguised as insurmountable obstacles.
How to post questions to get better answers faster
Managing Transaction Logs
March 17, 2021 at 3:55 am
I attache the schema of sample data scrambled for testing
March 17, 2021 at 11:13 am
I would replace the recursive CTE 'DateRangeEndDays' with a look up against a calendar table
DateRangeEndDays (StudentId, EndDay) AS
SELECT a.StudentId,c.CalendarDate
FROM FirstLastAbsentDates a
INNER JOIN MyCalendar c ON c.CalendarDate BETWEEN a.FirstDay AND a.LastDay
)
See here for details about a calendar table
https://qa.sqlservercentral.com/steps/bones-of-sql-the-calendar-table
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537March 17, 2021 at 2:21 pm
It seems the query is performing a partial cross join. The cross join could be replaced by a tally function and window (rows between 30 preceding and current row) function. As far as I can tell this query produces the same output
with
FirstLastAbsentDates(StudentId, EndDate, FirstDay, LastDay) AS (
select a.StudentId, b.EndDate
,min(a.Att_Date)
,max(a.Att_Date)
from dbo.testTruantDay a
join dbo.testSchoolYear b ON a.SchoolYearId = b.Id
and getdate() between b.StartDate AND b.EndDate
where a.IsAbsent = 1
and a.IsCleared = 0
group by a.StudentId, b.EndDate
having max(a.Att_Date) > min(a.Att_Date))
select fl.StudentId, oa.cum_cnt as MaxAbsenceCount,
isnull(oa.Att_Date, dateadd(day, 1, fl.FirstDay)) as MaxAbsenceDate
from FirstLastAbsentDates fl
outer apply (select top(1) td.Att_Date,
sum(case when IsAbsent is null then 0 else 1 end)
over (order by dt rows between 30 preceding and current row) cum_cnt
from dbo.fnTally(0, datediff(day, fl.FirstDay, fl.LastDay)) fn
cross apply (values (dateadd(day, fn.n, fl.FirstDay))) v(dt)
left join dbo.testTruantDay td on fl.StudentId=td.StudentId
and v.dt=td.Att_Date
and td.IsAbsent=1
and td.IsCleared=0
order by cum_cnt desc) oa
order by fl.StudentId;
StudentId MaxAbsenceCount MaxAbsenceDate
7000499 1 2021-01-27 00:00:00.000
7000524 7 2021-01-29 00:00:00.000
7000540 2 2020-12-18 00:00:00.000
7000559 21 2020-12-16 00:00:00.000
7000907 22 2020-11-06 00:00:00.000
7002389 11 2020-11-06 00:00:00.000
7003845 9 2020-12-18 00:00:00.000
7004443 3 2020-11-16 00:00:00.000
...
(246 rows returned)
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
March 17, 2021 at 4:04 pm
I attache the schema of sample data scrambled for testing
What is the expected output from this data?
Jeffrey Williams
Problems are opportunities brilliantly disguised as insurmountable obstacles.
How to post questions to get better answers faster
Managing Transaction Logs
March 17, 2021 at 4:19 pm
Thanks, what is the benefit to use a calendar table, all we need is to know the first day and end day, If we create a calendar table, that has each day in the table, every year we need then to recreate the calendar table.
March 17, 2021 at 4:21 pm
I will give it a try, so the fnTally is the one I can copy here? thanks
https://qa.sqlservercentral.com/scripts/create-a-tally-function-fntally
March 17, 2021 at 4:42 pm
I will give it a try, so the fnTally is the one I can copy here? thanks
https://qa.sqlservercentral.com/scripts/create-a-tally-function-fntally%5B/quote%5D
That's the one I use. There are others out there and if you're interested:
Number series generator challenge solutions – Part 3
Number series generator challenge solutions – Part 2
Number series generator challenge solutions – Part 1
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
March 17, 2021 at 5:03 pm
Wow, that is so much faster. Thank you Steve. Now let me take a look to understand the code, is it the fnTally make it so much faster?
Wow, that is so much faster. Thank you Steve. Now let me take a look to understand the code, is it the fnTally make it so much faster?
It's true fnTally is blazingly fast compared to other methods of generating series. These 2 lines of code serve the purpose of a calendar table:
dbo.fnTally(0, datediff(day, fl.FirstDay, fl.LastDay)) fn
cross apply (values (dateadd(day, fn.n, fl.FirstDay))) v(dt)
An actual calendar table would be just as fast or perhaps marginally faster. The original query counts 30 day totals by generating the series over and over. That was the slow part. Switching to a moving window is more efficient
sum(case when IsAbsent is null then 0 else 1 end)
over (order by dt rows between 30 preceding and current row) cum_cnt
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
March 17, 2021 at 11:40 pm
Another point:
Original code is in a table-valued function, then I use other table to join it.
For your code above, should I do same thing, put it in a table valued function?
Thanks
March 18, 2021 at 11:16 am
Yes it could be put in the same function
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply