Any way to have MIN take NULLS into account?

  • I know that by default, the MIN function ignores any NULL values. Is there any way to cause it to not ignore NULLs? Basically I'm trying to find a MIN date from a rowset. However, if any of the dates are NULL I want the MIN value to be NULL. Only if all dates are not NULL do I want an actual date value.

    Possible?

  • DECLARE @table TABLE (d date)

    INSERT INTO @table

    VALUES (GETDATE()), (GETDATE() + 10), (GETDATE() - 50000), (NULL)

    SELECT

    DATEADD(dd, 0, NULLIF(MIN(COALESCE(DATEDIFF(dd, 0, d), -2147483648)), -2147483648))

    FROM

    @table

  • Nest NullIf and IsNull.

    If your datetime column is called MyDate and it's in table MyTable:

    select IsNull(MyDate, 0)

    from dbo.MyTable;

    That will replace the nulls with 0s (datetime 0 = 1 Jan 1900).

    Then:

    select min(IsNull(MyDate, 0))

    from dbo.MyTable;

    Will get 0 if there's a null in the list.

    And:

    select NullIf(Min(IsNull(MyDate, 0)), 0)

    from dbo.MyTable;

    Will null out the zero.

    If you aren't familiar with IsNull or NullIf (most people know IsNull, it seems very few know NullIf), you can look those up in Books Online.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • GSquared (2/23/2010)


    Nest NullIf and IsNull.

    If your datetime column is called MyDate and it's in table MyTable:

    select IsNull(MyDate, 0)

    from dbo.MyTable;

    That will replace the nulls with 0s (datetime 0 = 1 Jan 1900).

    Then:

    select min(IsNull(MyDate, 0))

    from dbo.MyTable;

    Will get 0 if there's a null in the list.

    And:

    select NullIf(Min(IsNull(MyDate, 0)), 0)

    from dbo.MyTable;

    Will null out the zero.

    If you aren't familiar with IsNull or NullIf (most people know IsNull, it seems very few know NullIf), you can look those up in Books Online.

    I considered this simpler solution too. However the OP didn't mentioned the data type to be used (remember this is a SQL2K8 forum):

    DECLARE @table TABLE (d date)

    INSERT INTO @table

    VALUES (GETDATE()), (GETDATE() + 10), (GETDATE() - 50000), (NULL)

    select NullIf(Min(IsNull(d, 0)), 0)

    from @table;

    results in 'Operand type clash: int is incompatible with date'

    DECLARE @table TABLE (d datetime)

    INSERT INTO @table

    VALUES (GETDATE()), (GETDATE() + 10), (GETDATE() - 50000), (NULL)

    select NullIf(Min(IsNull(d, 0)), 0)

    from @table;

    results in '1873-04-02 21:14:13.823'. Remember that 0 is the numeric representation of 1/1/1900. It all depends on the range of dates being processed. I used -2147483648 as the numeric representation of a NULL date to be on the safe side.

    Peter

  • Makes sense.

    You posted while I was writing my answer, so I missed your post completely.

    As a second thought, it might be easier to select top 1 and order by the date column.

    I just tested that on DateTime, SmallDateTime, DateTime2, and Date, and it worked in all cases.

    Would depend on what else you need it for.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • GSquared (2/23/2010)


    Makes sense.

    ...

    As a second thought, it might be easier to select top 1 and order by the date column.

    Makes sense too. But looking at the query plan of the simple examples/data in this thread the top 1/order by date requires a sort which makes it more expensive. An index on the date column will changes that I suppose.

    Peter

  • Thanks guys. This is what I finally got to work (I'm using DATE datatypes)

    NULLIF(MIN(ISNULL(MyDateField, '1/1/1900')), '1/1/1900')

    I was getting "int" conversion errors using values other than '1/1/1900'. I actually declared a variable for '1/1/1900' and used that versus multiple literals.

  • milzs (2/23/2010)


    Thanks guys. This is what I finally got to work (I'm using DATE datatypes)

    NULLIF(MIN(ISNULL(MyDateField, '1/1/1900')), '1/1/1900')

    I was getting "int" conversion errors using values other than '1/1/1900'. I actually declared a variable for '1/1/1900' and used that versus multiple literals.

    Makes sense.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Most of the methods shown are problematic - some because they're not guaranteed to work, others because performing an aggregate over a function is always bad news for performance. TOP with an ORDER BY is probably reliable - if it is documented somewhere that NULLs always sort first. I can't say I'd be entirely happy even with that though - it doesn't seem very nice at all to rely on the sort order in that way.

    The fact is that the built-in aggregates ignore NULLs, so you have at least three choices:

    1. Avoid NULLs

    2. Write the code to handle NULLs explicitly and obviously, an example is below

    3. Write a dozen or so lines in your favourite .NET language to create a custom aggregate named something like NULL_MIN()

    SELECT date_value =

    CASE

    WHEN EXISTS (SELECT * FROM @Table T WHERE d IS NULL) THEN NULL

    ELSE MIN(d)

    END

    FROM @Table;

    Paul

  • This seems a bit more efficient:

    SELECT

    date_value = CASE WHEN (COUNT(*) > COUNT(date)) THEN NULL ELSE MIN(date) END

    FROM test

    🙂

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • I just performed this test:

    SET NOCOUNT ON;

    CREATE TABLE #T (

    ID INT IDENTITY PRIMARY KEY,

    Section INT NOT NULL,

    DT DATE);

    INSERT INTO #T (Section, DT)

    SELECT N1.Number, DATEADD(day, CHECKSUM(NEWID())%36525, '1/1/2000')

    FROM Common.dbo.Numbers N1

    CROSS JOIN Common.dbo.Numbers N2

    WHERE N1.Number BETWEEN 1 AND 1000

    AND N2.Number BETWEEN 1 AND 1000;

    ;WITH CTE AS

    (SELECT DT, ROW_NUMBER() OVER (ORDER BY NEWID()) AS Row

    FROM #T)

    UPDATE CTE

    SET DT = NULL

    WHERE Row <= 100;

    SET STATISTICS TIME ON;

    SELECT NULLIF(MIN(ISNULL(DT,'1/1/1900')), '1/1/1900')

    FROM #T AS T

    GROUP BY Section;

    SELECT CASE

    WHEN EXISTS (SELECT * FROM #T AS T2 WHERE DT IS NULL AND T2.Section = T1.Section) THEN NULL

    ELSE MIN(T1.DT)

    END

    FROM #T T1

    GROUP BY T1.Section;

    SELECT CASE

    WHEN (COUNT(*) > COUNT(DT)) THEN NULL

    ELSE MIN(DT)

    END

    FROM #T AS T1

    GROUP BY Section;

    The results are:

    SQL Server Execution Times:

    CPU time = 374 ms, elapsed time = 196 ms.

    Warning: Null value is eliminated by an aggregate or other SET operation.

    SQL Server Execution Times:

    CPU time = 3761 ms, elapsed time = 3626 ms.

    Warning: Null value is eliminated by an aggregate or other SET operation.

    SQL Server Execution Times:

    CPU time = 453 ms, elapsed time = 248 ms.

    The first, simplest solution, is also the fastest.

    As an afterthought, I also tested:

    ;WITH CTE AS

    (SELECT DT, ROW_NUMBER() OVER (PARTITION BY Section ORDER BY DT) AS Row

    FROM #T T1)

    SELECT DT

    FROM CTE

    WHERE Row = 1;

    SQL Server Execution Times:

    CPU time = 1904 ms, elapsed time = 1462 ms.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • I have performed the test too with similar results. All solutions benefit from an index on columns Section and DT. Paul's solution benefits the most from the index and becomes comparable to Gsquared's solution (in terms of CPU time, the number of logical reads is more than twice as high).

    Peter

  • Huh, thats odd. When I tested it on my system this morning, with a million rows, no GROUP BY (did I miss that requirement?) and an index on the date column, mine was the fastest, though not by much.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Before moving on to performance, I'd just like to get correctness out of the way 😛

    The NULLIF solution bothers me (not Peter's refined version):

    DECLARE @T

    TABLE (section INTEGER NOT NULL, date_time DATETIME NULL);

    INSERT @T (section, date_time) VALUES (1, '19000101');

    INSERT @T (section, date_time) VALUES (1, '20010101');

    INSERT @T (section, date_time) VALUES (2, '19000101');

    INSERT @T (section, date_time) VALUES (2, NULL);

    INSERT @T (section, date_time) VALUES (2, '20010101');

    SELECT section,

    date_time= NULLIF(MIN(ISNULL(date_time, '19000101')), '19000101')

    FROM @T AS T1

    GROUP BY T1.section;

    Results:

    section date_time

    1 NULL

    2 NULL

    Not quite right 😀

    Paul

  • On to Performance then.

    Result: It depends :w00t:

    Well it might. The original test rig had one row per section, which is probably not the most common scenario in the real world.

    Most tables like this would have a many more than one row per group, and many fewer groups.

    All the solutions that scan the index once should have broadly similar performance.

    Other solutions might be an order of magnitude faster (or slower) depending on how many groups there are, how many rows per group, and how common a NULL date_time value is.

    For example, on a test rig of one million rows, with fifty sections, and ten thousand randomly-placed NULL values:

    Test 1: NULLIF

    Table 'Test'. Scan count 1, logical reads 3231

    CPU time = 421 ms, elapsed time = 439 ms.

    Test 2: EXISTS with MIN

    Table 'Test'. Scan count 51, logical reads 3417

    CPU time = 469 ms, elapsed time = 930 ms.

    Test 3: EXISTS with TOP

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Test'. Scan count 51, logical reads 3417

    CPU time = 281 ms, elapsed time = 619 ms.

    Test 4: DISTINCT with TOP

    Table 'Test'. Scan count 51, logical reads 3417

    CPU time = 266 ms, elapsed time = 276 ms.

    Test 5: COUNTs with MIN

    Table 'Test'. Scan count 1, logical reads 3231

    CPU time = 484 ms, elapsed time = 484 ms.

    Test 6: ROW_NUMBER

    Table 'Test'. Scan count 1, logical reads 3231

    CPU time = 453 ms, elapsed time = 446 ms.

    Test 7: Recursive CTE with APPLY

    Table 'Worktable'. Scan count 2, logical reads 301

    Table 'Test'. Scan count 101, logical reads 303

    CPU time = 16 ms, elapsed time = 2 ms.

    Check out the result for test 7.

    Paul

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply