How to use 'between' in CASE statement?

  • Wow, SO many things going on here...

    And I don't mean to be blunt, but please, PLEASE go read the following articles:
    http://sqlperformance.com/2014/06/t-sql-queries/dirty-secrets-of-the-case-expression
    http://sqlblog.com/blogs/aaron_bertrand/archive/2009/10/16/bad-habits-to-kick-mishandling-date-range-queries.aspx
    http://sqlblog.com/blogs/aaron_bertrand/archive/2011/10/19/what-do-between-and-the-devil-have-in-common.aspx

    Let's start with some fundamentals:

    1. CASE is an EXPRESSION, not a STATEMENT.


    An expression evaluates to a single result.  Not a predicate (where-condition).

    2. BETWEEN is evil.


    That may be hyperbole, but it's for a purpose: You need to STOP using it with DATEs/DATETIMEs.  Read the article for full explanations.

    3. Do NOT use integer math with DATEs; use the DATEADD function with fully specified DATEPART names.


    (e.g. DATEADD(MONTH, 1, GETDATE()), not 'mm' or 'dd' or that nonsense).
    This post with your "sample data" is particularly problematic in this regard -- so I hope & pray that you only wrote it that way for quickness, and that such things don't make it anywhere near production code.

    So what do I do?

    Now, believe it or not, I fully understand your requirement -- it's actually very common for reports / reporting-queries.  You have a report that shows "Month-to-Date revenue", for example; but, on the 1st of the month, the business users demand that it show "last month's data", because they know there's no value in the report if it shows only "today" (the 1st of the month), because there's no revenue yet!

    This really isn't that hard, but by trying to use BETWEEN, you're making it more difficult.  That is why Thom's solution is best -- it avoids all the pitfalls discussed above!  But beware: of assumptions, and requirements...


    Assumptions:


    Kingston
    's and Luis's solutions both make the same assumption: that your actual DATETIME column's values have no TIME component!  In other words, if to use your Employees example, 'Bob' was created on 5/31/2017 at 5pm, you would end up not counting him for the month of May!  That's probably bad.  Now, you may have worked around this entirely by defining the column simply as DATE, not DATETIME -- and if you did that on purpose with full understanding of the distinction, awesome, good job, just ignore this paragraph.


    Clarifying requirements:

    The other thing worth noting is that your requirement is not 100% clear.

    • Thom's solution gives you what we call "Month-to-Date", or more accurately "Month-to-Now", meaning, in plain English, "Everything from the beginning of the month up thru this moment" -- i.e. excluding records that may have a future date within the same month.
    • Kingston's & Luis's solutions actually give you "the Whole Month", even stuff "in the future" of the same month, i.e. "Everything from the beginning of the month up to the last day of the month", excluding (whether intentional or not) anything on the last day of the month after "zero hour", i.e. after 12am that day (again, see the assumption above).

    Clarifying intent:

    After re-reading your original post and your sample data, I do see that you've consistently specified DATE, rather than DATETIME.  Fantastic, things get easier and you don't necessarily need to worry so much about the last few issues described above.  However, can you guarantee that this will never change?  Some new Dev/DBA or some new business requirement won't come along and say "Hey, we really should include the time at which the Employee was hired, too -- it could come in handy on some reporting/analytics project for BigWigX in HR"...?  No?  150% guaranteed NEVER?  Okay then!

    Closing thoughts:

    The point is, the logical equivalent to the date-range expression "On or after the beginning of the desired time-period, and BEFORE the beginning of the NEXT desired time-period", is always functional regardless of the underlying datatype.

    Translated to code, using a desired time-period of "a whole target month" as example:
       MyDate_or_DateTime >= @First_of_Target_Month
    AND MyDate_or_DateTime < @First_of_Month_After_Target

    Then, whether or not you want "special treatment" on the 1st day of the month, is a completely separate issue -- simply set those @First_of_% parameters accordingly!
    S
    imilarly, if your requirement is "month-to-date", and not "full whole month", simply set the 2nd parameter to "today" or "now" or "tomorrow", depending on how you want to treat the current day's data.

    Again, in code, if we want "current month, month-to-date":
       MyDate_or_DateTime >= @First_of_This_Month
    AND MyDate_or_DateTime < @Now_or_TodayZeroHour_or_TomorrowZeroHour_depending_on_requirements

    Thanks, happy reading, and good luck!

    -Nate the DBA natethedba.com

  • Luis Cazares - Wednesday, May 10, 2017 11:59 AM

    Charmer - Wednesday, May 10, 2017 11:18 AM

     
    Luis,  I could not understand. If you don't mind,  Could you please explain me a little bit more?

    Let me try to give an example with sample data.
    On the following query:
     - The column Today represents the day you would run the report.
     - The column Yesterday means the day before the report is run.
     - The columns StartDate and EndDate represent the end and start of the month used for your report.
    As you can see, all the dates for today return the current month except for the first day of each month. The first day of each month, it gets the previous month. But if you compare it to the column esterday, the month is always the same for the 3 columns.

    WITH E(n) AS(
      SELECT n FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0))E(n)
    ),
    cteCalendar(calDate) AS(
      SELECT DATEADD( dd, ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) - 1, '2017') calDate
      FROM E a, E b
    )
    SELECT
        Today     = calDate,
        Yesterday = calDate - 1,
        StartDate = DATEADD(mm, DATEDIFF( mm, 0, calDate - 1), 0),
        EndDate   = DATEADD( DAY, -1, DATEADD( MONTH, 1, StartDate ) )
    FROM cteCalendar
    CROSS APPLY (SELECT DATEADD(mm, DATEDIFF( mm, 0, calDate - 1), 0) AS StartDate)x;

    Hi Luis,  

    Thank you for your response.   I'm sorry that I couldn't get the time today to go through your explanation. I'll go through this tomorrow for sure and will let you know.

    Thanks,
    Charmer

  • nsjohnso - Thursday, May 11, 2017 12:31 PM

    Wow, SO many things going on here...

    And I don't mean to be blunt, but please, PLEASE go read the following articles:
    http://sqlperformance.com/2014/06/t-sql-queries/dirty-secrets-of-the-case-expression
    http://sqlblog.com/blogs/aaron_bertrand/archive/2009/10/16/bad-habits-to-kick-mishandling-date-range-queries.aspx
    http://sqlblog.com/blogs/aaron_bertrand/archive/2011/10/19/what-do-between-and-the-devil-have-in-common.aspx

    Let's start with some fundamentals:

    1. CASE is an EXPRESSION, not a STATEMENT.


    An expression evaluates to a single result.  Not a predicate (where-condition).

    2. BETWEEN is evil.


    That may be hyperbole, but it's for a purpose: You need to STOP using it with DATEs/DATETIMEs.  Read the article for full explanations.

    3. Do NOT use integer math with DATEs; use the DATEADD function with fully specified DATEPART names.


    (e.g. DATEADD(MONTH, 1, GETDATE()), not 'mm' or 'dd' or that nonsense).
    This post with your "sample data" is particularly problematic in this regard -- so I hope & pray that you only wrote it that way for quickness, and that such things don't make it anywhere near production code.

    So what do I do?

    Now, believe it or not, I fully understand your requirement -- it's actually very common for reports / reporting-queries.  You have a report that shows "Month-to-Date revenue", for example; but, on the 1st of the month, the business users demand that it show "last month's data", because they know there's no value in the report if it shows only "today" (the 1st of the month), because there's no revenue yet!

    This really isn't that hard, but by trying to use BETWEEN, you're making it more difficult.  That is why Thom's solution is best -- it avoids all the pitfalls discussed above!  But beware: of assumptions, and requirements...


    Assumptions:


    Kingston
    's and Luis's solutions both make the same assumption: that your actual DATETIME column's values have no TIME component!  In other words, if to use your Employees example, 'Bob' was created on 5/31/2017 at 5pm, you would end up not counting him for the month of May!  That's probably bad.  Now, you may have worked around this entirely by defining the column simply as DATE, not DATETIME -- and if you did that on purpose with full understanding of the distinction, awesome, good job, just ignore this paragraph.


    Clarifying requirements:

    The other thing worth noting is that your requirement is not 100% clear.

    • Thom's solution gives you what we call "Month-to-Date", or more accurately "Month-to-Now", meaning, in plain English, "Everything from the beginning of the month up thru this moment" -- i.e. excluding records that may have a future date within the same month.
    • Kingston's & Luis's solutions actually give you "the Whole Month", even stuff "in the future" of the same month, i.e. "Everything from the beginning of the month up to the last day of the month", excluding (whether intentional or not) anything on the last day of the month after "zero hour", i.e. after 12am that day (again, see the assumption above).

    Clarifying intent:

    After re-reading your original post and your sample data, I do see that you've consistently specified DATE, rather than DATETIME.  Fantastic, things get easier and you don't necessarily need to worry so much about the last few issues described above.  However, can you guarantee that this will never change?  Some new Dev/DBA or some new business requirement won't come along and say "Hey, we really should include the time at which the Employee was hired, too -- it could come in handy on some reporting/analytics project for BigWigX in HR"...?  No?  150% guaranteed NEVER?  Okay then!

    Closing thoughts:

    The point is, the logical equivalent to the date-range expression "On or after the beginning of the desired time-period, and BEFORE the beginning of the NEXT desired time-period", is always functional regardless of the underlying datatype.

    Translated to code, using a desired time-period of "a whole target month" as example:
       MyDate_or_DateTime >= @First_of_Target_Month
    AND MyDate_or_DateTime < @First_of_Month_After_Target

    Then, whether or not you want "special treatment" on the 1st day of the month, is a completely separate issue -- simply set those @First_of_% parameters accordingly!
    S
    imilarly, if your requirement is "month-to-date", and not "full whole month", simply set the 2nd parameter to "today" or "now" or "tomorrow", depending on how you want to treat the current day's data.

    Again, in code, if we want "current month, month-to-date":
       MyDate_or_DateTime >= @First_of_This_Month
    AND MyDate_or_DateTime < @Now_or_TodayZeroHour_or_TomorrowZeroHour_depending_on_requirements

    Hi John,

    First of all,  thank you for your post.  It gives me a lot of information. 
    So to the point,  my column is as of yet.  Actually I can't guarantee that because we are loading data from Oracle to sql.  In Oracle it has been datetime2 kind of format.  So like you said,  client can ask us any time to change the column back to datetime2 from date(as of yet) in sql. FYI,  I'm using Thom's suggested method.  This is month to date report.  If today is first day then last month data else month to date data. I'm using case statement in my WHERE clause to find out first day of month and setting the condition accordingly.

    Thanks,
    Charmer

  • Nate the DBA - Thursday, May 11, 2017 12:31 PM

    2. BETWEEN is evil.


    That may be hyperbole, but it's for a purpose: You need to STOP using it with DATEs/DATETIMEs.  Read the article for full explanations.

    I disagree.  The problem with BETWEEN arises when it is used with continuous data, but it is perfectly fine with discrete data.  DATETIME data is continuous, so you are correct when it comes to DATETIME data, but DATE data is discrete, and there is no issue with using BETWEEN with DATE data.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • drew.allen - Friday, May 12, 2017 8:12 AM

    I disagree.  The problem with BETWEEN arises when it is used with continuous data, but it is perfectly fine with discrete data.  DATETIME data is continuous, so you are correct when it comes to DATETIME data, but DATE data is discrete, and there is no issue with using BETWEEN with DATE data.
    Drew

    I see your point, and I like your use of the terminology "continuous" vs "discrete".  I still feel personally that it's cleaner, clear-er, and more consistent, to use the ">= and <" notation, but yes, it's more about style at that point than accuracy.  It's easier to teach devs/coders a single consistent rule than to say "use this notation for this type, else use that notation for the other type".  Plus, to extrapolate on devs, it's almost as likely as not that either/both datatypes will get coalesced to the same app-code construct (like C# DateTime) anyway, so again, easier to be consistent with a grammar that works in both languages/layers, just in case of laziness.  😉

    -Nate the DBA natethedba.com

Viewing 5 posts - 16 through 19 (of 19 total)

You must be logged in to reply to this topic. Login to reply