UDF, Computed Column, Persisted, Determinism

Question

Post reply

UDF, Computed Column, Persisted, Determinism

Al B

SSC-Addicted

Points: 436
More actions
April 5, 2010 at 5:04 pm

#140145

Hi folks,

Having a tough go at this, but have learned a lot in the process. Here's my dilemma:

1) I have a Computed Column that calls a Scalar-Valued UDF

2) I want this Computed Column to be Persisted, so that it doesn't query the UDF every single time the column is accessed

3) The UDF is not deterministic (not sure I understand why, but the function is below):

-- Parameters @ParentID int, @Keyword nvarchar(255) -- Output Variable DECLARE @Score int SELECT @Score = SUM(k.KeywordCount) FROM Topics AS t, Keywords AS k WHERE k.Keyword = @Keyword AND t.PostID = k.PostID AND t.ParentID = @ParentID SET @Score = ISNULL(@Score,0) RETURN @Score

4) I've tried using WITH SCHEMABINDING to make the UDF deterministic, which seemed to work...but then I can't use it in a Computed Column because that apprently doesn't allow user access.

I've read quite a bit about what defines "deterministic", but I'm still missing something I guess. It would help if I understood why my UDF is not deterministic - so that maybe I can figure out how to make it so (without using SCHEMABINDING).

Any thoughts would be greatly appreciated. If I'm barking up the wrong tree, I'd love to hear some alternate solutions.

As well, I can upgrade to SQL 2008 if that helps any...

Thanks for any advice!

AL

Al Bsharah
Blog / Twitter

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply

LutzM SSC Guru Points: 107049 More actions · Answer 1

Why don't you use a view instead?

And, if you really want to use a UDF, at least use a table-valued function.

Something like

CREATE FUNCTION your_function (@ParentID int, @Keyword nvarchar(255))



RETURNS TABLE

AS

RETURN

(

  SELECT ISNULL(SUM(k.KeywordCount),0) as ReturnValue

  FROM         Topics AS t

  INNER JOIN  Keywords AS k

  ON             k.Keyword = @Keyword

    AND         t.PostID = k.PostID

    AND         t.ParentID = @ParentID

);

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

Al B SSC-Addicted Points: 436 More actions · Answer 2

Hi Lutz, thanks for the (quick) reply!

I'm not sure I follow...please help me understand what I'm missing from your suggestions:

1) How and where is a View going to help my cause? My UDF queries for a subset of data that depends on other columns in the row where the computation occurs. Creating a view will still require me to put a query into the "Computed Column" field, and that doesn't seem to be allowed (Error: "Subqueries are not allowed in this context")

2) Is returning a TABLE in my UDF going to help make it Deterministic? Otherwise, what's the benefit when I just need one integer returned?

Please let me know what I'm missing...thanks for the help!

AL

Al Bsharah
Blog / Twitter

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 3

lmu92 (4/5/2010)

...at least use a table-valued function.

Teaching opportunity here, Lutz. Tell the op "why" and how you'd use an iTVF in a computed column. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Lynn Pettis SSC Guru Points: 442447 More actions · Answer 4

Lynn Pettis

SSC Guru

Points: 442447

April 5, 2010 at 7:53 pm

#1146180

How many rows in the table you are querying that you use the UDF?

Also, to learn a bit about scalar and in-line TVF, you might want to read the following blog: Comparing Hardcoded functions, In-Line TVF's, and Scalar Functions.

You may find it a bit enlightening.

Al B SSC-Addicted Points: 436 More actions · Answer 5

Hi folks,

Lynn:

At this point I'm querying a table that contains 3,000,000 rows (that will likely have 10x that sooner or later). I'd like to add a Persistent Calculated Column within this table that references my UDF.

All:

I appreciate all the input on Scalar vs. Table-Valued functions, I have seen the light. <smile> Nice find, Lynn.

I still need to figure out why my UDF is not Deterministic (TVF version isn't either)...so that I can persist this calculated column. I feel like I may be trying to do something not possible, I'd just like to understand why if that's the case.

Thanks again!

Al Bsharah
Blog / Twitter

Lynn Pettis SSC Guru Points: 442447 More actions · Answer 6

Here is the problem, you are only giving us PART of the problem. If you want better answers, give use the whole problem, plus the code you have already developed.

Please read the first article I reference below in my signature block to see how you should actually post a question. Not mentioned in the article, but also provide the expected results based on the sample data. This should be as a query output, not just descibed in words.

LutzM SSC Guru Points: 107049 More actions · Answer 7

Jeff Moden (4/5/2010)

lmu92 (4/5/2010)

...at least use a table-valued function.

Teaching opportunity here, Lutz. Tell the op "why" and how you'd use an iTVF in a computed column. 😉

I guess I completely forgot to mention that neither the view nor the iTVF is a replacement of the UDF used so far and I wouldn't use a computed column at all... :blush:

I obviously made too many assumptions (for instance: that someone would look up iTVF's to see that those are used just like views/tables). Sorry about that.

Here's what I would do:

1) evaluate the business case whether it is really required to have that column in that table (seems like violating normalization) with no other option

Side note: I would consider this as an option: using a CTE to limit the number of rows that are selected (WHERE clause inside the CTE) and apply the function I mentioned earlier to the limited result set.

2) if that column really needs to be there I would investigate how often that specific column would need to be updated based on changes of the underlying columns (Insert/Update/Delete).

3) Based on that I may decide to use a trigger after insert/update/delete either on one or both of the tables referenced in the original UDF limited to changes of the columns in question using the UPDATE() clause within the trigger together with the INSERTED logical table provided inside the trigger and the iTVF from above. This trigger then would update the column in question.

4) I finally would set up a test environment where I could compare the various soultions that are available and test, test, and test again. My testing would include evaluation of the execution plans while applying various common scenarios, index optimization (if required), concurrency testing (if required). The "usual stuff". I'd probably include the UDF in the testing scenario to see how the alternatives perform.

Once I'd be done with those steps I'd discuss the results with the people in charge.

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

Paul White SSC Guru Points: 150457 More actions · Answer 8

Al B (4/5/2010)

Any thoughts would be greatly appreciated. If I'm barking up the wrong tree, I'd love to hear some alternate solutions.

The problem here is not one of being deterministic or not: current versions (including 2008) of SQL Server do not support persisted columns that reference a UDF that does data access. The reason is that SQL Server has no way to keep the persisted value up-to-date.

The alternative here is to use the mechanism that does guarantee that updated values will be reflected in the persisted value - and that solution is an indexed view.

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

LutzM SSC Guru Points: 107049 More actions · Answer 9

Paul White NZ (4/6/2010)

Al B (4/5/2010)

Any thoughts would be greatly appreciated. If I'm barking up the wrong tree, I'd love to hear some alternate solutions.

The problem here is not one of being deterministic or not: current versions (including 2008) of SQL Server do not support persisted columns that reference a UDF that does data access. The reason is that SQL Server has no way to keep the persisted value up-to-date.

The alternative here is to use the mechanism that does guarantee that updated values will be reflected in the persisted value - and that solution is an indexed view.

I'd like to agree but it seems like that's not going to work here, since the OP used the ISNULL function to get a non-NULL result:

SET @Score = ISNULL(@Score,0)

And, AFAIK, an indexed view is not allowed if (straight from BOL SS2K5):

a SUM function [...] references a nullable expression.

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

Paul White SSC Guru Points: 150457 More actions · Answer 10

lmu92 (4/6/2010)

I'd like to agree but it seems like that's not going to work here, since the OP used the ISNULL function to get a non-NULL result:

SET @Score = ISNULL(@Score,0)

And, AFAIK, an indexed view is not allowed if (straight from BOL SS2K5):

a SUM function [...] references a nullable expression.



SET ANSI_NULLS,

    ANSI_PADDING,

    ANSI_WARNINGS,

    ARITHABORT,

    CONCAT_NULL_YIELDS_NULL,

    QUOTED_IDENTIFIER

    ON;

SET NUMERIC_ROUNDABORT

    OFF;

GO

USE tempdb;

GO

CREATE  TABLE dbo.Lutz

        (

        row_id      INTEGER IDENTITY PRIMARY KEY,

        group_id    INTEGER NOT NULL,

        data        INTEGER NULL

        );

INSERT  dbo.Lutz

        (group_id, data)

SELECT  1, 100      UNION ALL

SELECT  1, 200      UNION ALL

SELECT  1, 300      UNION ALL

SELECT  2, 100      UNION ALL

SELECT  2, 150      UNION ALL

SELECT  2, NULL;

GO

CREATE  VIEW dbo.LutzView

WITH    SCHEMABINDING

AS

SELECT  group_id,

        total = SUM(ISNULL(data, 0)),

        item_count = COUNT_BIG(*)

FROM    dbo.Lutz

GROUP   BY

        group_id;

GO

CREATE  UNIQUE CLUSTERED INDEX c

ON      dbo.LutzView (group_id);

GO

SELECT  *

FROM    dbo.LutzView WITH (NOEXPAND);

GO

DROP    VIEW dbo.LutzView;

DROP    TABLE dbo.Lutz;

GO

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 11

Paul White NZ (4/6/2010)

Al B (4/5/2010)

Any thoughts would be greatly appreciated. If I'm barking up the wrong tree, I'd love to hear some alternate solutions.

The problem here is not one of being deterministic or not: current versions (including 2008) of SQL Server do not support persisted columns that reference a UDF that does data access. The reason is that SQL Server has no way to keep the persisted value up-to-date.

The alternative here is to use the mechanism that does guarantee that updated values will be reflected in the persisted value - and that solution is an indexed view.

THAT's the teaching opportunity I was hoping for. Thanks, Paul.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

LutzM SSC Guru Points: 107049 More actions · Answer 12

@ Paul:

Interesting!

What do I misread/misunderstand when reading BOL?

I thought what you just demonstrated wouldn't be possible...

It looks like query optimizer is using ISNULL to override the columns NULL definition.

VERY NICE!! Never thought that's possible to "cheat QO" that way so I never even tried...

That, of course, changes everything and makes my posts in this thread irrelevant if not misleading... :ermm::crying::ermm:

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

Paul White SSC Guru Points: 150457 More actions · Answer 13

lmu92 (4/6/2010)

@ Paul:

Interesting!

What do I misread/misunderstand when reading BOL?

I thought what you just demonstrated wouldn't be possible...

It looks like query optimizer is using ISNULL to override the columns NULL definition.

VERY NICE!! Never thought that's possible to "cheat QO" that way so I never even tried...

That, of course, changes everything and makes my posts in this thread irrelevant if not misleading... :ermm::crying::ermm:

No, no...not at all irrelevant or misleading...just another example of how things work around here at Team SSC 😀 We always get a solution in the end, but it often requires input from many people.

Anyhow. I am a big fan of indexed views - when used appropriately (a bit like SQLCLR!)

I tend to consider them wherever I might previously have been forced to store aggregates using triggers...

Indexed views are a lot more flexible than many folks realise - and the BOL documentation doesn't always help with that. The issue here is not that the underlying column is NULLable - it is that the aggregated expression cannot be NULLable. I hope that makes sense?

That said, indexed views do have many restrictions (for extremely good reasons - mostly related to correctness concerns and view-maintenance plan complexity) but can often be used to at least 'break the back' of many kinds of problems. As a last resort, of course, we can always hack something together with triggers 🙁

Finally - thank you, Jeff.

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

matzer SSCrazy Eights Points: 9727 More actions · Answer 14

Paul White NZ (4/6/2010)

The problem here is not one of being deterministic or not: current versions (including 2008) of SQL Server do not support persisted columns that reference a UDF that does data access. The reason is that SQL Server has no way to keep the persisted value up-to-date.

The problem is one of being not deterministic. A deterministic function should return the same value each time it is passed the same set of values for its parameters. A function that does data access is not deterministic because the data accessed may change over time and it may return different values for the same set of values for its input parameters in different calls. Non-determinism is the reason that SQL Server has no way to keep the persisted value up-to-date.

Peter