quantize ?

  • Mark,

    I don't know if I can get it to be a fast as Serqiy says he can, but I did get it down to 8 seconds...

    First, add this index...

     CREATE INDEX IX_JBMTestCitrixLog_ServerName_TimeGenerated

         ON dbo.JBMTestCitrixLog(ServerName,TimeGenerated)

    Then run this modification of the same report I provided before...

     SELECT SessionDate = CASE

                            WHEN GROUPING(d.TheDate) = 0

                            THEN CONVERT(CHAR(11),d.TheDate,100)

                            ELSE CHAR(254)+'Grand Total'

                          END,

            ServerName  = CASE

                            WHEN GROUPING(d.ServerName) =0

                            AND GROUPING(d.TheDate)=0

                            THEN d.ServerName

                            WHEN GROUPING(d.TheDate)=1

                            THEN ''

                            ELSE CHAR(222)+'Daily Total'

                          END,

            '12:00 AM'  = STR(SUM(CASE WHEN d.HourSlot = 00 THEN d.TheCount ELSE 0 END),8),

            '01:00 AM'  = STR(SUM(CASE WHEN d.HourSlot = 01 THEN d.TheCount ELSE 0 END),8),

            '02:00 AM'  = STR(SUM(CASE WHEN d.HourSlot = 02 THEN d.TheCount ELSE 0 END),8),

            '03:00 AM'  = STR(SUM(CASE WHEN d.HourSlot = 03 THEN d.TheCount ELSE 0 END),8),

            '04:00 AM'  = STR(SUM(CASE WHEN d.HourSlot = 04 THEN d.TheCount ELSE 0 END),8),

            '05:00 AM'  = STR(SUM(CASE WHEN d.HourSlot = 05 THEN d.TheCount ELSE 0 END),8),

            '06:00 AM'  = STR(SUM(CASE WHEN d.HourSlot = 06 THEN d.TheCount ELSE 0 END),8),

            '07:00 AM'  = STR(SUM(CASE WHEN d.HourSlot = 07 THEN d.TheCount ELSE 0 END),8),

            '08:00 AM'  = STR(SUM(CASE WHEN d.HourSlot = 08 THEN d.TheCount ELSE 0 END),8),

            '09:00 AM'  = STR(SUM(CASE WHEN d.HourSlot = 09 THEN d.TheCount ELSE 0 END),8),

            '10:00 AM'  = STR(SUM(CASE WHEN d.HourSlot = 10 THEN d.TheCount ELSE 0 END),8),

            '11:00 AM'  = STR(SUM(CASE WHEN d.HourSlot = 11 THEN d.TheCount ELSE 0 END),8),

            '12:00 PM'  = STR(SUM(CASE WHEN d.HourSlot = 12 THEN d.TheCount ELSE 0 END),8),

            '01:00 PM'  = STR(SUM(CASE WHEN d.HourSlot = 13 THEN d.TheCount ELSE 0 END),8),

            '02:00 PM'  = STR(SUM(CASE WHEN d.HourSlot = 14 THEN d.TheCount ELSE 0 END),8),

            '03:00 PM'  = STR(SUM(CASE WHEN d.HourSlot = 15 THEN d.TheCount ELSE 0 END),8),

            '04:00 PM'  = STR(SUM(CASE WHEN d.HourSlot = 16 THEN d.TheCount ELSE 0 END),8),

            '05:00 PM'  = STR(SUM(CASE WHEN d.HourSlot = 17 THEN d.TheCount ELSE 0 END),8),

            '06:00 PM'  = STR(SUM(CASE WHEN d.HourSlot = 18 THEN d.TheCount ELSE 0 END),8),

            '07:00 PM'  = STR(SUM(CASE WHEN d.HourSlot = 19 THEN d.TheCount ELSE 0 END),8),

            '08:00 PM'  = STR(SUM(CASE WHEN d.HourSlot = 20 THEN d.TheCount ELSE 0 END),8),

            '09:00 PM'  = STR(SUM(CASE WHEN d.HourSlot = 21 THEN d.TheCount ELSE 0 END),8),

            '10:00 PM'  = STR(SUM(CASE WHEN d.HourSlot = 22 THEN d.TheCount ELSE 0 END),8),

            '11:00 PM'  = STR(SUM(CASE WHEN d.HourSlot = 23 THEN d.TheCount ELSE 0 END),8),

            Total       = STR(COUNT(*),8),

            [Avg/Hr]    = CASE

                            WHEN GROUPING(d.TheDate) = 0

                            THEN STR(COUNT(*)/24,8)

                            ELSE 'N/A'

                          END

       FROM (--Derived table "d" isolates and preformats the data we want

             SELECT DATEADD(dd,DATEDIFF(dd,0,TimeGenerated),0) AS TheDate,

                    DATEPART(hh,TimeGenerated) AS HourSlot,

                    ServerName,

                    COUNT(*) AS TheCount

               FROM dbo.JBMTestCitrixLog WITH (NOLOCK)

           GROUP BY DATEADD(dd,DATEDIFF(dd,0,TimeGenerated),0),

                    DATEPART(hh,TimeGenerated),

                    ServerName

            ) d

      GROUP BY d.TheDate, d.ServerName WITH ROLLUP

      ORDER BY SessionDate, ServerName

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Thanks for the benchmark, Serqiy... the computed columns help a lot especially when you make them both deterministic so they can be indexed... got it down to 7 seconds on 4 million records... I still wouldn't mind seeing what you can do with this.  Even an old dog like me can learn new tricks

    I added the following computed columns (Serqiy's suggestion) (couldn't use datediff/dateadd because became non-deterministic)

            TheDate AS CONVERT(CHAR(8),TimeGenerated,112),

            HourSlot AS DATEPART(hh,TimeGenerated)

    I dropped the previous index and added this one, instead...

     CREATE INDEX IX_JBMTestCitrixLog1

         ON dbo.JBMTestCitrixLog(ServerName,TheDate,HourSlot)

    ... and the modified the report code just a bit...

     SELECT SessionDate = CASE

                            WHEN GROUPING(d.TheDate) = 0

                            THEN CONVERT(CHAR(11),CAST(d.TheDate AS DATETIME),100)

                            ELSE CHAR(254)+'Grand Total'

                          END,

            ServerName  = CASE

                            WHEN GROUPING(d.ServerName) =0

                            AND GROUPING(d.TheDate)=0

                            THEN d.ServerName

                            WHEN GROUPING(d.TheDate)=1

                            THEN ''

                            ELSE CHAR(222)+'Daily Total'

                          END,

            '12:00 AM'  = STR(SUM(CASE WHEN d.HourSlot = 00 THEN d.TheCount ELSE 0 END),8),

            '01:00 AM'  = STR(SUM(CASE WHEN d.HourSlot = 01 THEN d.TheCount ELSE 0 END),8),

            '02:00 AM'  = STR(SUM(CASE WHEN d.HourSlot = 02 THEN d.TheCount ELSE 0 END),8),

            '03:00 AM'  = STR(SUM(CASE WHEN d.HourSlot = 03 THEN d.TheCount ELSE 0 END),8),

            '04:00 AM'  = STR(SUM(CASE WHEN d.HourSlot = 04 THEN d.TheCount ELSE 0 END),8),

            '05:00 AM'  = STR(SUM(CASE WHEN d.HourSlot = 05 THEN d.TheCount ELSE 0 END),8),

            '06:00 AM'  = STR(SUM(CASE WHEN d.HourSlot = 06 THEN d.TheCount ELSE 0 END),8),

            '07:00 AM'  = STR(SUM(CASE WHEN d.HourSlot = 07 THEN d.TheCount ELSE 0 END),8),

            '08:00 AM'  = STR(SUM(CASE WHEN d.HourSlot = 08 THEN d.TheCount ELSE 0 END),8),

            '09:00 AM'  = STR(SUM(CASE WHEN d.HourSlot = 09 THEN d.TheCount ELSE 0 END),8),

            '10:00 AM'  = STR(SUM(CASE WHEN d.HourSlot = 10 THEN d.TheCount ELSE 0 END),8),

            '11:00 AM'  = STR(SUM(CASE WHEN d.HourSlot = 11 THEN d.TheCount ELSE 0 END),8),

            '12:00 PM'  = STR(SUM(CASE WHEN d.HourSlot = 12 THEN d.TheCount ELSE 0 END),8),

            '01:00 PM'  = STR(SUM(CASE WHEN d.HourSlot = 13 THEN d.TheCount ELSE 0 END),8),

            '02:00 PM'  = STR(SUM(CASE WHEN d.HourSlot = 14 THEN d.TheCount ELSE 0 END),8),

            '03:00 PM'  = STR(SUM(CASE WHEN d.HourSlot = 15 THEN d.TheCount ELSE 0 END),8),

            '04:00 PM'  = STR(SUM(CASE WHEN d.HourSlot = 16 THEN d.TheCount ELSE 0 END),8),

            '05:00 PM'  = STR(SUM(CASE WHEN d.HourSlot = 17 THEN d.TheCount ELSE 0 END),8),

            '06:00 PM'  = STR(SUM(CASE WHEN d.HourSlot = 18 THEN d.TheCount ELSE 0 END),8),

            '07:00 PM'  = STR(SUM(CASE WHEN d.HourSlot = 19 THEN d.TheCount ELSE 0 END),8),

            '08:00 PM'  = STR(SUM(CASE WHEN d.HourSlot = 20 THEN d.TheCount ELSE 0 END),8),

            '09:00 PM'  = STR(SUM(CASE WHEN d.HourSlot = 21 THEN d.TheCount ELSE 0 END),8),

            '10:00 PM'  = STR(SUM(CASE WHEN d.HourSlot = 22 THEN d.TheCount ELSE 0 END),8),

            '11:00 PM'  = STR(SUM(CASE WHEN d.HourSlot = 23 THEN d.TheCount ELSE 0 END),8),

            Total       = STR(SUM(d.TheCount),8),

            [Avg/Hr]    = CASE

                            WHEN GROUPING(d.TheDate) = 0

                            THEN STR(SUM(d.TheCount)/24,8)

                            ELSE 'N/A'

                          END

       FROM (--Derived table "d" isolates and preformats the data we want

             SELECT TheDate,

                    HourSlot,

                    ServerName,

                    COUNT(*) AS TheCount

               FROM dbo.JBMTestCitrixLog WITH (NOLOCK)

           GROUP BY TheDate,

                    HourSlot,

                    ServerName

            ) d

      GROUP BY d.TheDate, d.ServerName WITH ROLLUP

      ORDER BY d.TheDate, d.ServerName

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Well, I'd just like to say,even before I get to put these suggestions into action, that you folks are amazing, I have used dialup bulletin boards, which gives you an idea of how long I have been around, but I really cannot recall anyone putting so much much effort and knowledge into helping someone, and that someone is me....even better!

    Profoundly inspirational, and the next time someone asks a question that I can help with, I certainly won't stint with my efforts.

    Jeff / Serqiy /Journeyman

    Thank you

  • Thanks for the feedback, Mark...  this one was a lot of fun!

    Serqiy... I'd still like to know the "magic" behind your 5 second report, if you get a chance.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Mark, you're welcome.

    Hope, you don't take all suggestions above as solutions, you take it just like directions.

    Because otherwise you are in trouble.

    Actually, none of them is absolutely right.

    Yes Jeff, you are right, dateadd(dd, datediff(dd, 0, TimeGenerated)) is not deterministic.

    But you were wrong saying "couldn't use datediff/dateadd because became non-deterministic". Check BOL: datediff and dateadd ARE DETERMINISTIC!

    It's implicit conversion of zero to datetime makes it non-deterministic!

    Include explicit CONVERT into the formula and nothing will stop you from creating index on that column!

    This works on my instance of SQL Server:

    CREATE TABLE dbo.[citrixlog] (

    [TimeGenerated] [datetime] NULL ,

    GeneratedOnHour AS (dateadd(hour,datediff(hour,convert(datetime,0,120),[TimeGenerated]),convert(datetime,0,120))) ,

    [SessionID] [int] NULL ,

    [ClientName] [varchar] (255) COLLATE Finnish_Swedish_CI_AS NULL ,

    [AppName] [varchar] (255) COLLATE Finnish_Swedish_CI_AS NULL ,

    [UserName] [varchar] (255) COLLATE Finnish_Swedish_CI_AS NULL

    )

    create index citrixlog_GeneratedOnHour ON dbo.[citrixlog](GeneratedOnHour)

    What about yours?

    Note, formula is a little bit shorter than before, am I that smart now or I was just that stupid then?

    And about 5 sec. There is no magic.

    Trick is nobody will ask for WHOLE report for all accounts for last 3 years. And if somebody will ask there is web page pagination to my service.

    So, the task is to filter only those records which will affect figures on screen. Typically it's a continuous range of values to be retrieved.

    Problem with non-indexed approach is that you need to create whole report at once, no matter if you suppose to display only small part of it related to the last week.

    Effective WHERE clause is the key. Actually I use many "join" tables containing only IDs. Like:

    AccountID,

    InvoiceID.

    Many tables with small number of columns allow me to create many clustered indexes and optimiser then can choose the most effective way to proceed for any particular type of range requested for any particular report.

    As I use to say, to writr an effective SELECT you need to starts with CREATE/ALTER TABLE.

    _____________
    Code for TallyGenerator

  • quote Actually, none of them is absolutely right.

    If you have one that is, would you mind sharing it for this example?

    quote And about 5 sec. There is no magic.

    Trick is nobody will ask for WHOLE report for all accounts for last 3 years

    Well, ya think?  You didn't say that when you bragged about the 5 second report... you only said that you created an executive summary from 4 million records in 5 seconds... that implied that you were processing and reporting on ALL 4 million records.   That's what my example did   I also regenned the test table to hold 24 million records across 365 days and added a where clause to return a month... does so in, you guessed it, under 5 seconds.  Like I've said... thank you for the tip on calculated columns... it works great. 

    My only concern is when someone has purchased software from some dummy 3rd party that uses bound controls that break when the underlying schema of the table has been added to.  I'll have to do a speed test with a view in that case.  Might even be one of those spots where an indexed view makes sense.

    quote It's implicit conversion of zero to datetime makes it non-deterministic!

    Outstanding!!!  Thank you for the tip!!!  I've always had problems with this and never even considered that   I am glad to see you made that same mistake that I did when you first posted the formula for the calculated column... let's me know that I'm not the only one that makes "mistrakes".

    quote

    Note, formula is a little bit shorter than before, am I that smart now or I was just that stupid then?

    Actually, I split the formula into two calculated columns so I wouldn't have to parse the column for simple sorting purposes at all...   Worked great... again, thanks for the tip.

            TheDate AS CONVERT(CHAR(8),TimeGenerated,112),

            HourSlot AS DATEPART(hh,TimeGenerated)

    quote

    Problem with non-indexed approach is that you need to create whole report at once, no matter if you suppose to display only small part of it related to the last week.

    That's why my latest 2 examples have an index on a new set of calculated columns... that was a good idea you had   So good that I took it one step further and included an index on a calculated column that included the date with no time.  Made things nasty fast and I've got you to thank for that suggestion.

    quote

    Hope, you don't take all suggestions above as solutions, you take it just like directions.

    Because otherwise you are in trouble.

    I really hope you don't think that's true.  If so, please post YOUR solution to the problem.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

Viewing 6 posts - 16 through 20 (of 20 total)

You must be logged in to reply to this topic. Login to reply