CURSORS Versus SETS

  • OK I am convinced sets are the way to go for all the right reasons speed etc.

    But I have a scenario where I think SETS wont work, or a Cursor is just simpler way to implement than SETS. It basically involves just 2 tables TblClientPOC and TblAppointments and 2 parameters

    @StartDate and @EndDate

    I will TRY to explain. IT sounds simple but isn't!, I dont want to post code at this stage as I am more interested in people opinion on this being viable at all with sets!

    I have a table called TblClientPOC (ClientID int, StartDate date, StartTime time, Duration int, Frequency int, StaffID int) and it allows the user to store a Client's appointments in simple way.i.e. ClientID, StartDate, StartTime, Duration, Frequency and StaffID (ie who's attending to the Client).

    Simple so far, now column Frequency can be one of three different type

    1. Daily

    2. Weekly

    3. Every Two Weeks

    Each Client will have at least two entries and possibly many entries (no max really) in TblClientPOC

    There are say 2500 Clients with about anything for 1 to 12 entries in the TblClientPOC. Taking an average of 3 entries for each client, that will be 7,500 entries in TblClientPOC when fully implemented.

    For each row in TblClientPOC where Frequency = 1 Do the following...

    ...proc1...

    1) CASE WHEN Startdate< =@StartDate, @StartDate ELSE Startdate

    2) CASE WHEN (ISNULL(Enddate,@EndDAte) <@EndDate, EndDate ELSE @EndDate

    3)Fill in all the dates in between using cteAllDates,

    4)Join them with Client ID, ApptDate, ApptTime, Duration and StaffID

    5)INSERT into table TblAppointments (ClientID int, ApptDate date, ApptTime time, Duration int, StaffID int)

    There will be at least 28 days between the StartDate and EndDate for all clients so I will be adding

    2500 Clients * 28 rows * 2 entries for each client = 140,000 rows for each 4 week period for

    For each row in TblClientPOC where Frequency = 2 'weekly appointments' do same as proc1 but will mean there are only 4 dates between the StartDate and EndDate (and there could be 5 of these for each client)

    2000 Clients * 4 Dates * 5 entries for each client = 40,000 rows for each 4 week period

    Then in turn take each row in TblClientPOC where Frequency = 3. 2 weekly appointments will mean 2 dates between the StartDate and EndDate and there could be 2 of these for a lot of client

    1000 Clients * 2 Dates * 2 entries for each client = 4,000 rows for each 4 week period

    Thats 184,000 rows approx * 6 periods = 1,104,000 records in TblAppointments.

    I am looking at cursors at the moment but would love to know if it can be done with sets at all?

    Basic Tables are as follows...

    [Code]

    CREATE TABLE [dbo].[TblClientPOCV2](

    [ID] [int] IDENTITY(1,1) NOT NULL,

    [ClientID] [nchar](10) NOT NULL,

    [StartDate] [date] NOT NULL,

    [ApptTime] [time](5) NOT NULL,

    [Duration] [int] NOT NULL,

    [Frequency] [int] NOT NULL,

    [EndDate] [date] NULL,

    [StaffID] [int] NOT NULL,

    CONSTRAINT [PK_TblClientPOC] PRIMARY KEY CLUSTERED

    (

    [ID] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    CREATE TABLE [dbo].[TblAppointments](

    [ID] [int] IDENTITY(1,1) NOT NULL,

    [ClientID] [int] NOT NULL,

    [ApptDate] [date] NOT NULL,

    [ApptTime] [time](5) NOT NULL,

    [Duration] [int] NOT NULL,

    [StaffID] [int] NOT NULL,

    CONSTRAINT [PK_TblAppointments] PRIMARY KEY CLUSTERED

    INSERT INTO [TblClientPOC]

    ([ClientID],[StartDate],[ApptTime],[Duration],[Frequency],[EndDate],[StaffID]

    VALUES

    (1111,'2016-01-01','10:00',30,1,'2016-01-29',123),

    (1111,'2016-01-01','15:00',45,1,'2016-01-29',123),

    (1111,'2016-01-02','12:00',30,2,'2016-01-29',123),

    (1111,'2016-01-03','12:00',30,2,'2016-01-29',123),

    (1111,'2016-01-04','12:00',30,2,'2016-01-29',123),

    (1111,'2016-01-05','12:00',30,2,'2016-01-29',123),

    (1111,'2016-01-06','12:00',30,2,'2016-01-29',123),

    (1111,'2016-01-07','12:00',30,2,'2016-01-29',123),

    (1111,'2016-01-08','12:00',45,3,'2016-01-29',123),

    (2222,'2016-01-01','09:00',30,1,'2016-01-29',222),

    (2222,'2016-01-01','17:00',30,1,'2016-01-29',222),

    (2222,'2016-01-02','12:00',30,2,'2016-01-29',222),

    (2222,'2016-01-03','12:00',30,2,'2016-01-29',222),

    (2222,'2016-01-04','12:00',30,2,'2016-01-29',222),

    (2222,'2016-01-05','13:00',30,2,'2016-01-29',222),

    (2222,'2016-01-06','13:00',30,2,'2016-01-29',222),

    (2222,'2016-01-07','13:00',30,2,'2016-01-29',222),

    (2222,'2016-01-08','13:00',30,3,'2016-01-29',222),

    GO

    [/code]

  • This shouldn't be too hard to do without a cursor.

    Before I dive in too deeply, what are the expected results for that sample data and some parameter values you would pass in?

    Cheers!

    EDIT: Noticed some issues with the DDL and sample data scripts (client table in the INSERT is not the same as the client table created, and the appointment table's definition is incomplete), so in case anyone else wants to help out, here's the script with those issues fixed:

    CREATE TABLE [dbo].[TblClientPOCV2](

    [ID] [int] IDENTITY(1,1) NOT NULL,

    [ClientID] [nchar](10) NOT NULL,

    [StartDate] [date] NOT NULL,

    [ApptTime] [time](5) NOT NULL,

    [Duration] [int] NOT NULL,

    [Frequency] [int] NOT NULL,

    [EndDate] [date] NULL,

    [StaffID] [int] NOT NULL,

    CONSTRAINT [PK_TblClientPOC] PRIMARY KEY CLUSTERED

    (

    [ID] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY];

    CREATE TABLE [dbo].[TblAppointments](

    [ID] [int] IDENTITY(1,1) NOT NULL,

    [ClientID] [int] NOT NULL,

    [ApptDate] [date] NOT NULL,

    [ApptTime] [time](5) NOT NULL,

    [Duration] [int] NOT NULL,

    [StaffID] [int] NOT NULL,

    CONSTRAINT [PK_TblAppointments] PRIMARY KEY CLUSTERED ( ID ASC) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY];

    INSERT INTO [TblClientPOCV2]

    ([ClientID],[StartDate],[ApptTime],[Duration],[Frequency],[EndDate],[StaffID])

    VALUES

    (1111,'2016-01-01','10:00',30,1,'2016-01-29',123),

    (1111,'2016-01-01','15:00',45,1,'2016-01-29',123),

    (1111,'2016-01-02','12:00',30,2,'2016-01-29',123),

    (1111,'2016-01-03','12:00',30,2,'2016-01-29',123),

    (1111,'2016-01-04','12:00',30,2,'2016-01-29',123),

    (1111,'2016-01-05','12:00',30,2,'2016-01-29',123),

    (1111,'2016-01-06','12:00',30,2,'2016-01-29',123),

    (1111,'2016-01-07','12:00',30,2,'2016-01-29',123),

    (1111,'2016-01-08','12:00',45,3,'2016-01-29',123),

    (2222,'2016-01-01','09:00',30,1,'2016-01-29',222),

    (2222,'2016-01-01','17:00',30,1,'2016-01-29',222),

    (2222,'2016-01-02','12:00',30,2,'2016-01-29',222),

    (2222,'2016-01-03','12:00',30,2,'2016-01-29',222),

    (2222,'2016-01-04','12:00',30,2,'2016-01-29',222),

    (2222,'2016-01-05','13:00',30,2,'2016-01-29',222),

    (2222,'2016-01-06','13:00',30,2,'2016-01-29',222),

    (2222,'2016-01-07','13:00',30,2,'2016-01-29',222),

    (2222,'2016-01-08','13:00',30,3,'2016-01-29',222);

  • Hi Jacob, thanks for your interest, I did leave out the PK on my initial table definitions to try tokeep it as simple looking as possible.

    One more correction ClientID should be of type int in both tables!

    Here is a sample of what the final output should look like...

    ClientID, ApptDate, ApptTime, Duration, StaffID

    1111,'2016-01-01','10:00',30,123

    1111,'2016-01-02','10:00',30,123

    1111,'2016-01-03','10:00',30,123

    1111,'2016-01-04','10:00',30,123

    1111,'2016-01-05','10:00',30,123

    ....

    ....

    1111,'2016-01-27','10:00',45,123

    1111,'2016-01-28','10:00',45,123

    1111,'2016-01-27','15:00',45,123

    1111,'2016-01-28','15:00',45,123

    ....

    2222,'2016-01-23','15:00',45,222

    2222,'2016-01-24','15:00',45,222

    2222,'2016-01-25','15:00',45,222

    2222,'2016-01-26','15:00',45,222

    2222,'2016-01-27','15:00',45,222

    2222,'2016-01-28','15:00',45,222

  • This is really not so hard, but a single query i can imagine excludes using indices on date columns. Nevertheless it should be faster then cursor. And 1+ million of rows hardly is a problem.

    Using the setup kindly provided by Jacob

    declare @startD date = '2016-01-04';

    declare @endD date = '2016-03-04';

    select [ClientID]

    , dateadd(d

    , rn*case [frequency] when 1 then 1 else 7*([frequency]-1) end

    , effectiveStart)

    , [ApptTime]

    , [Duration]

    , [StaffID]

    --

    , t2.ID, [frequency], effectiveStart, effectiveEnd, rn

    from [TblClientPOCV2] t2

    cross apply (

    select effectiveStart = case when [StartDate] <@startD then @startD else [StartDate] end

    ,effectiveEnd = case when ISNULL(Enddate,@EndD) <@EndD then EndDate ELSE @EndD end ) p

    join (

    select top (1000) rn = row_number() over (order by (select null)) - 1

    from master.sys.all_objects

    ) tally

    on dateadd(d

    , rn*case [frequency] when 1 then 1 else 7*([frequency]-1) end

    , effectiveStart) <= effectiveEnd

    order by [ClientID],t2.ID,rn

  • The requirement's still not so clear.

    Specifically, with a frequency of 2, there should be an appointment every seven days. However, are those appointments occurring every seven days from the StartDate in the table, and then we just pull the resulting dates that happen to lie in the range defined by the parameters?

    The query I was throwing together assumed that was how it worked, which is different than how Serg's query works (it determines appointment dates by adding multiples of 1,7, or 14 days to the effective start date determined by the parameters, instead of the StartDate in the client table).

    I'm guessing it's supposed to work the first way, but that's just that: guessing 🙂

    If you could give us the following, it would help clarify things tremendously:

    Full expected output for the following rows in the client table, along with the values of @StartDate and @EndDate on which those results are based:

    ID ClientID StartDate ApptTime Duration Frequency EndDate StaffID

    ----------- ---------- ---------- ---------------- ----------- ----------- ---------- -----------

    17 2222 2016-01-07 13:00:00.00000 30 2 2016-01-29 222

    18 2222 2016-01-08 13:00:00.00000 30 3 2016-01-29 222

    Also, what is the expected maximum expected date range for this query? I ask because these solutions use a tally table to generate the dates, and a decent chunk of CPU time can be saved if we can keep that number manageable. Serg's uses 1000, which is probably safe, since that's almost 3 years.

    I can have the solution dynamically determine it, but that requires an additional scan of the client table. Whether the CPU time saved by using the smallest tally table necessary offsets the extra scan will depend on the typical date ranges used and the size of the client table.

    Cheers!

  • Hi Jacob, you are correct in your thinking.

    The output from your sample data would be...

    222, '2016-01-07','13:00',30,222

    222, '2016-01-14','13:00',30,222

    222, '2016-01-21','13:00',30,222

    222, '2016-01-29','13:00',30,222

    222, '2016-01-08','13:00',30,222

    222, '2016-01-22','13:00',30,222

    Also max days between dates will be 28.

    Hope this helps

  • Thanks, that helps a bit.

    Let me know if the code below does what you need. For this solution, the TOP (N) in the tally will have to be enough for the number of days between the oldest StartDate in the client table and the @EndDate parameter. If that will never be more than 28, then it could technically be reduced to 28 and save some CPU. For now I'm leaving it at the number of days in a leap year for no particular reason. Conversely, if the gap between the oldest StartDate in the table and the @EndDate parameter could be very large, N will have to be changed to accommodate that.

    As it turns out, it wasn't letting me post the code, so I've attached it as a text file.

    Cheers!

  • Hi Jacob ,

    Tried it out and its nearly there.

    Here the test data I've been using

    INSERT INTO [TblClientPOCV2]

    ([ClientID],[StartDate],[ApptTime],[Duration],[Frequency],[EndDate],[StaffID])

    VALUES

    (2528,'04/01/2016','09:00',20,1,'29/01/2016', 123),

    (2528,'04/01/2016','10:00',30,2,'29/01/2016', 123),

    (1111,'04/01/2016','10:00',35,3,'29/01/2016', 123),

    (2528,'10/01/2106','17:00',15,2,'31/01/2106', 123),

    It correctly processes the first three row, but does not process the fourth row at all!

    Cant figure out why tho!

    There can be multiple frequency entries for each client for the same dates but not at the same times.

  • Typo on the year in the last record. It has a year of 2106 instead of 2016

    For better, quicker answers, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply