UDF, Computed Column, Persisted, Determinism

  • Peter Brinkhaus (4/6/2010)


    A function that does data access is not deterministic...

    USE tempdb;

    GO

    CREATE TABLE dbo.t (a INT NOT NULL) ;

    GO

    CREATE FUNCTION dbo.f (@i INT) RETURNS INT WITH SCHEMABINDING AS

    BEGIN

    SELECT @i = a FROM dbo.t WHERE a = @i;

    RETURN @i;

    END;

    GO

    SELECT data_access_function_is_determintic =

    OBJECTPROPERTYEX(OBJECT_ID(N'dbo.f', 'FN'), 'IsDeterministic')

    GO

    CREATE TABLE dbo.t2

    (

    a INT NOT NULL,

    b AS (dbo.f(a))

    );

    GO

    DROP TABLE dbo.t2;

    DROP FUNCTION dbo.f;

    DROP TABLE dbo.t;

    GO

    ...because the data accessed may change over time and it may return different values for the same set of values for its input parameters in different calls. Non-determinism is the reason that SQL Server has no way to keep the persisted value up-to-date.

    Ignoring the fact that the function above is deterministic for a second, consider what would happen if column b in table T2 was persisted.

    When column a in T1 changes, SQL Server would (a) need to detect that change and connect it back to the function and the column in T2; and (b) include operators in the query plan for the change to T2 to change the value in column b appropriately. That is deeply non-trivial, even in this ridiculously simple case.

    In case there are any lingering doubts, let me finish with a reference:

    From Deterministic and Non-deterministic Functions (Books Online):


    Deterministic functions always return the same result any time they are called with a specific set of input values and given the same state of the database.


  • Unless I am missing something completely, it looks like the 'deterministic' function is just an identity function. Looks like 'SELECT @i = a FROM dbo.t ...' is a no-op because it's an assignment to an input parameter. Take a look at the modified example:

    USE tempdb;

    GO

    CREATE TABLE dbo.t (a INT NOT NULL) ;

    GO

    CREATE FUNCTION dbo.f (@i INT) RETURNS INT WITH SCHEMABINDING AS

    BEGIN

    SELECT @i = 3 /* a */ FROM dbo.t WHERE a = @i;

    RETURN @i;

    END;

    GO

    SELECT data_access_function_is_determintic =

    OBJECTPROPERTYEX(OBJECT_ID(N'dbo.f', 'FN'), 'IsDeterministic')

    GO

    CREATE TABLE dbo.t2

    (

    a INT NOT NULL,

    b AS (dbo.f(a))

    );

    GO

    INSERT INTO t(a) VALUES (0)

    INSERT INTO t2(a) VALUES (2)

    SELECT * FROM T

    SELECT * FROM T2

    DROP TABLE dbo.t2;

    DROP FUNCTION dbo.f;

    DROP TABLE dbo.t;

    GO

    What surprises me is the following modified example:

    USE tempdb;

    GO

    CREATE TABLE dbo.t (a INT NOT NULL) ;

    GO

    CREATE FUNCTION dbo.f (@i INT) RETURNS INT WITH SCHEMABINDING AS

    BEGIN

    DECLARE @j-2 INT

    SELECT @j-2 = a FROM dbo.t WHERE a = @i;

    RETURN @j-2;

    END;

    GO

    SELECT data_access_function_is_determintic =

    OBJECTPROPERTYEX(OBJECT_ID(N'dbo.f', 'FN'), 'IsDeterministic')

    GO

    SELECT dbo.f(1)

    INSERT INTO t(a) values (1)

    SELECT dbo.f(1)

    DROP FUNCTION dbo.f;

    DROP TABLE dbo.t;

    GO

    The function returns two different values (or maybe 1 value and 1 undefined) but it is still considered deterministic. Doesn't make sense to me.

    Peter

  • Peter Brinkhaus (4/6/2010)


    Unless I am missing something completely, it looks like the 'deterministic' function is just an identity function. Looks like 'SELECT @i = a FROM dbo.t ...' is a no-op because it's an assignment to an input parameter.

    Heh...don't quibble over rushed code 🙂

    But, there's nothing wrong with the assignment - as the change to @j-2 demonstrates.

    A quick look at a query plan shows that it is not a no-op.

    Anyway, on to the main issue...

    The function returns two different values (or maybe 1 value and 1 undefined) but it is still considered deterministic. Doesn't make sense to me.

    That's because you have not understood what 'deterministic' means. Let's look at the quote from BOL again:

    "Deterministic functions always return the same result any time they are called with a specific set of input values and given the same state of the database."

    The function is deterministic, since it always produces the same result (possibly NULL, but never 'undefined') for the same input parameters and the same database state.

    When table t is empty, it produces the same result for a given input parameter. When t contains a row, it still produces repeatable results. This is a pretty key point, so it is important to fully understand it.

    SQLCLR programmers might find it easier to grasp, since there are separate attributes to mark a module as Deterministic, Precise, performing User Data Access, and performing System Data Access. There really is no connection between being deterministic and doing data access.

    I am glad you raised the point however, since it is widely misunderstood to behave as you described.

    Paul

  • To me it seems MS has reinvented its own definition of (non-)determinism. Given the MS definition, even GETDATE() would be a deterministic function, if only for 3ms. Anyway, it was just a nuance. Just tried to look for an answer why the function posted by the OP wasn't considered deterministic.

    Peter

  • Peter Brinkhaus (4/6/2010)


    To me it seems MS has reinvented its own definition of (non-)determinism. Given the MS definition, even GETDATE() would be a deterministic function, if only for 3ms.

    I don't think there is a concept of deterministic for a period of time :laugh:

    A function is either deterministic, or it isn't.

    From the same link posted earlier:

    "You cannot influence the determinism of any built-in function. Each built-in function is deterministic or nondeterministic based on how the function is implemented by SQL Server."

    The same link lists the status of many built-in functions (though note that PARSENAME is incorrectly documented as deterministic - see my Connect Item).

    Anyway, it was just a nuance. Just tried to look for an answer why the function posted by the OP wasn't considered deterministic.

    If you look back to the first post, you will see that the OP confirmed the function as deterministic - the question was why it could not be persisted, which I have explained.

    Maybe the point you are missing (regarding GETDATE) is the scope of the phrase 'database state' - this refers to state as a whole, not just content.

    Paul

  • Paul White NZ (4/6/2010)


    Anyway, it was just a nuance. Just tried to look for an answer why the function posted by the OP wasn't considered deterministic.

    If you look back to the first post, you will see that the OP confirmed the function as deterministic - the question was why it could not be persisted, which I have explained.

    From the OP's first post:

    3) The UDF is not deterministic (not sure I understand why, but the function is below):

    It's unclear to me why defining the function WITH SCHEMABINDING would make it deterministic.

    Maybe the point you are missing (regarding GETDATE) is the scope of the phrase 'database state' - this refers to state as a whole, not just content.

    Talking about definitions and database states, according to my version of BOL (see also http://msdn.microsoft.com/en-us/library/ms178091.aspx), a database state can be one of the following: ONLINE, OFFLINE, RESTORING, RECOVERING, RECOVERY PENDING, SUSPECT, EMERGENCY. During the examples I produced my database state was continuously ONLINE. How confusing can it be 🙂

    I still think the MS definition of (non-)determinism doesn't follow the mathematical definition of (non-)determinism.

    Peter

  • Peter Brinkhaus (4/6/2010)


    It's unclear to me why defining the function WITH SCHEMABINDING would make it deterministic.

    The reason is that the engine only evaluates a function for determinism if it is schema-bound.

    Otherwise, it assumes that the function is non-deterministic. This is a performance optimization: a non-schema-bound UDF might reference other UDFs which might have changed and might now be non-deterministic. This would require checking all referenced objects for determinism on every call to the top-level UDF - clearly not a performance win.

    As an aside, the engine also marks a non-schema-bound function as doing both system and user data access, for exactly the same reasons (to avoid run-time checks). This has the side-effect of requiring halloween protection in update plans - another performance loser.

    This is why it is often recommended to schema-bind functions, even if they do no data access (so the engine will evaluate the function for determinism and data access)

    Talking about definitions and database states, according to my version of BOL (see also http://msdn.microsoft.com/en-us/library/ms178091.aspx), a database state can be one of the following: ONLINE, OFFLINE, RESTORING, RECOVERING, RECOVERY PENDING, SUSPECT, EMERGENCY. During the examples I produced my database state was continuously ONLINE. How confusing can it be 🙂 I still think the MS definition of (non-)determinism doesn't follow the mathematical definition of (non-)determinism.

    Amusing 😉

    Though actually, I do think it is as close as practicable to the mathematical definition (after all, this is a database engine, not an equation in pure math).

Viewing 7 posts - 16 through 21 (of 21 total)

You must be logged in to reply to this topic. Login to reply