Set based iteration

  • derek.colley (5/23/2012)


    Howard - was not knocking your effort (would be a bit rich considering my approach wasn't set based!) but I was after an ordered insert in the absence of a PK.

    Sorry, but there's no such thing (assuming you're meaning Clustered Index when you say PK - they're actually not synonymous, but I'm getting off topic :-)).

    One of the key things to learn about SQL is that the data's not stored in a physical flat structure and even if it was, it's not necessarily delivered from a select statement in a linear/serial way. The only way to guarantee the order of a select statement is to have an order by that defines that order.

    For example, you could add the N column to the end table in this case, then order by N ASC,Data ASC, this would give you the result set you wanted.

    You'll quickly come unstuck when you start looking at this in non-trivial examples in real life if you're relying on the order that you inserted into a heap (a heap is the term for a table without a clustered index). Even with a clustered index, it's still not guaranteed without an order by clause.

  • derek.colley (5/23/2012)


    Mike - I corrected the WITH to ;WITH and it ran beautifully - 28 seconds. Many thanks.

    Lynn - trying yours now 🙂 The more practice I get with this kind of thing, the better I'll understand it.

    Be careful here. This cte that Mike used is what is called a recursive cte. It is NOT set based, it is hidden looping. Notice the time difference of 4 seconds and 28 seconds? That would be affect of looping 10k times.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Took me a couple to find the article from Jeff about this topic.

    http://qa.sqlservercentral.com/articles/T-SQL/74118/[/url]

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean Lange (5/23/2012)


    Be careful here. This cte that Mike used is what is called a recursive cte. It is NOT set based, it is hidden looping.

    How do you define "set-based"? If each iteration processes a set (even if the set has one member) then in what sense is it not set-based? Should a "set-based" query be a single statement (the recursive CTE passes that test, the WHILE loop does not)? Or should "set-based" exclude any physical execution plan that includes a nested loops join? It all seems a bit woolly to me.

    Notice the time difference of 4 seconds and 28 seconds? That would be affect of looping 10k times.

    If that were true, the following would run for quite a long time, wouldn't it?

    DECLARE

    @start datetime2(7) = SYSDATETIME(),

    @bucket integer;

    WITH x AS

    (

    SELECT 1 AS id

    UNION ALL

    SELECT x.id + 1

    FROM x

    WHERE id < 10000

    )

    SELECT

    @bucket = x.id

    FROM x

    OPTION (MAXRECURSION 0);

    SELECT

    elapsed = DATEDIFF(millisecond, @start, SYSDATETIME());

    Now don't misunderstand me, recursive CTEs are not a very efficient way to produce sequential numbers (though performance differences may be inconsequential in very many cases), but NOT set-based?

    edit: removed the somewhat important word 'not' 🙂

  • SQL Kiwi (5/23/2012)


    Sean Lange (5/23/2012)


    Be careful here. This cte that Mike used is what is called a recursive cte. It is NOT set based, it is hidden looping.

    How do you define "set-based"? If each iteration processes a set (even if the set has one member) then in what sense is it not set-based? Should a "set-based" query be a single statement (the recursive CTE passes that test, the WHILE loop does not)? Or should "set-based" exclude any physical execution plan that includes a nested loops join? It all seems a bit woolly to me.

    Notice the time difference of 4 seconds and 28 seconds? That would be affect of looping 10k times.

    If that were true, the following would run for quite a long time, wouldn't it?

    DECLARE

    @start datetime2(7) = SYSDATETIME(),

    @bucket integer;

    WITH x AS

    (

    SELECT 1 AS id

    UNION ALL

    SELECT x.id + 1

    FROM x

    WHERE id < 10000

    )

    SELECT

    @bucket = x.id

    FROM x

    OPTION (MAXRECURSION 0);

    SELECT

    elapsed = DATEDIFF(millisecond, @start, SYSDATETIME());

    Now don't misunderstand me, recursive CTEs are not a very efficient way to produce sequential numbers (though performance differences may not be inconsequential in very many cases), but NOT set-based?

    Good point Paul. Perhaps I should have said that is still hidden RBAR instead of saying it is not set based. In fact, the recursive CTE would have met the thread title perfectly. It is exactly "Set based iteration". 😛

    It all seems a bit woolly to me.

    I sheepishly step away from my definitively stated yet incorrect position.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Slight variation on Lynn's method:

    WITH

    N1 AS (SELECT N1.n FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS N1 (n)),

    N2 AS (SELECT L.n FROM N1 AS L, N1 AS R),

    N3 AS (SELECT n = ROW_NUMBER() OVER (ORDER BY (SELECT 1)) FROM N2 AS L, N2 AS R)

    SELECT

    id = IDENTITY(integer, 1, 1),

    name = DB_NAME(mf.database_id)

    INTO #Destination

    FROM N3, master.sys.master_files AS mf

    WHERE

    mf.[file_id] = 1

    ORDER BY

    N3.n, name;

    ;

    The ORDER BY sets the order in which IDENTITY values are assigned - there is still no inherent ordering to the created heap table.

  • michael vessey (5/23/2012)


    try this

    with x (id) as

    (select 1 as id

    union all

    select ID+1 from X as id

    where id<10000

    )

    insert into fooTable

    select d.name from sysdatabases d cross join x option (maxrecursion 10000);

    Oooooohhhh, be careful, Michael. That's a "Counting rCTE". Read the following article for why that's generally a bad idea.

    http://qa.sqlservercentral.com/articles/T-SQL/74118/

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • derek.colley (5/23/2012)


    Howard

    Here's the example code I ran:

    CREATE TABLE numbers ( N INT)

    DECLARE @loopCounter INT -- yes, I know 🙂

    SET @loopCounter = 1

    WHILE @loopCounter < 10001

    BEGIN

    INSERT INTO numbers VALUES ( @loopCounter )

    SET @loopCounter = @loopCounter + 1

    END

    RAISERROR('Numbers table created.',0,0) WITH NOWAIT

    The loop isn't a performance problem here because the code will be used only once to create the table. The real problem is that you're not practicing thinking "set-base". If you keep using loops for such simple tasks, you will keep thinking loops instead of "Pseudo Cursors".

    The MAJOR problem with your Tally Table is that you have not assigned a Clustered PK with a FILLFACTOR of 100. It's not a numbers or Tally Table without it and, without it, a While Loop can beat it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • SQL Kiwi (5/23/2012)


    Sean Lange (5/23/2012)


    Be careful here. This cte that Mike used is what is called a recursive cte. It is NOT set based, it is hidden looping.

    How do you define "set-based"? If each iteration processes a set (even if the set has one member) then in what sense is it not set-based?

    Not looping at the T-SQL Level. 😉 By the definition you just gave, a WHILE Loop qualifies as "set based" because each iteration processes a set of one.

    A recursive CTE working on a simple count is nothing more than a hidden cursor complete with it's own temporary table in the form of a working table. It's not only NOT set based, it's RBAR and can actually be slower than a While Loop and use more resources, as well.

    But, let's just skip all that subjective definition stuff for a minute. Let's stick with the pragmatic stuff. Can you make an rCTE count from 1 to 10,000 (or pick a reasonable number) faster than Itzik's cascading Cross Joined CTE method, faster than two tables Cross Joined with ROW_NUMBER, or by simply selecting from a Tally Table?

    Do you want to call the Counting rCTE "Slow Set Based"? 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Sean Lange (5/23/2012)


    SQL Kiwi (5/23/2012)


    Sean Lange (5/23/2012)


    Be careful here. This cte that Mike used is what is called a recursive cte. It is NOT set based, it is hidden looping.

    How do you define "set-based"? If each iteration processes a set (even if the set has one member) then in what sense is it not set-based? Should a "set-based" query be a single statement (the recursive CTE passes that test, the WHILE loop does not)? Or should "set-based" exclude any physical execution plan that includes a nested loops join? It all seems a bit woolly to me.

    Notice the time difference of 4 seconds and 28 seconds? That would be affect of looping 10k times.

    If that were true, the following would run for quite a long time, wouldn't it?

    DECLARE

    @start datetime2(7) = SYSDATETIME(),

    @bucket integer;

    WITH x AS

    (

    SELECT 1 AS id

    UNION ALL

    SELECT x.id + 1

    FROM x

    WHERE id < 10000

    )

    SELECT

    @bucket = x.id

    FROM x

    OPTION (MAXRECURSION 0);

    SELECT

    elapsed = DATEDIFF(millisecond, @start, SYSDATETIME());

    Now don't misunderstand me, recursive CTEs are not a very efficient way to produce sequential numbers (though performance differences may not be inconsequential in very many cases), but NOT set-based?

    Good point Paul. Perhaps I should have said that is still hidden RBAR instead of saying it is not set based. In fact, the recursive CTE would have met the thread title perfectly. It is exactly "Set based iteration". 😛

    It all seems a bit woolly to me.

    I sheepishly step away from my definitively stated yet incorrect position.

    It's not an incorrect position. rCTE's that count are not set based. They're hidden RBAR. If Paul's definition describes "Set Based", then a While Loop is Set Based as well because it handles sets of 1. There are some While Loops (like the one's used to build hierarchies) that actually are set based but count sequentially using recursion or iteration at the T-SQL Level is anything but set based.

    A Cross Join? Now that's Set Based. A SELECT from a Tally Table? Set Based. Using Itzik's cascading Cross Joined CTEs? Also set based. rCTE to count? NOT set based.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden (5/23/2012)


    Oooooohhhh, be careful, Michael. That's a "Counting rCTE". Read the following article for why that's generally a bad idea. http://qa.sqlservercentral.com/articles/T-SQL/74118/

    Pro Tip: Reading the existing replies to a thread can help prevent posting duplicate information. :w00t:

  • Jeff Moden


    SQL Kiwi


    How do you define "set-based"? If each iteration processes a set (even if the set has one member) then in what sense is it not set-based?

    Not looping at the T-SQL Level. 😉

    By that definition, recursive CTEs are set-based too, since there is no looping 'at the T-SQL level'; it's a single statement. Perhaps you mean something else by 'at the T-SQL level'? It's really not clear. One might say it's a bit woolly. I'm glad that every SQLCLR function and procedure I write counts as set-based though 🙂

    By the definition you just gave, a WHILE Loop qualifies as "set based" because each iteration processes a set of one.

    I gave no definition; I asked in what sense a particular scenario would not be considered set-based. Moreover, you may recall that Hugo Kornelis' quirky-update-matching running totals method published in MVP Deep Dives Volume I was called Set-Based Iteration, and used a WHILE loop. This was the example I had in mind.

    A recursive CTE working on a simple count is nothing more than a hidden cursor complete with it's own temporary table in the form of a working table. It's not only NOT set based, it's RBAR and can actually be slower than a While Loop and use more resources, as well.

    From the same post: "Now don't misunderstand me, recursive CTEs are not a very efficient way to produce sequential numbers (though performance differences may be inconsequential in very many cases), but NOT set-based?". :doze:

    But, let's just skip all that subjective definition stuff for a minute. Let's stick with the pragmatic stuff. Can you make an rCTE count from 1 to 10,000 (or pick a reasonable number) faster than Itzik's cascading Cross Joined CTE method, faster than two tables Cross Joined with ROW_NUMBER, or by simply selecting from a Tally Table?

    No I can't, because it is the wrong tool for the job, as I have already said. I can't force you to read what I write before replying, but it might save some time! Nevertheless, it is true that in very many practical cases, it won't matter whether it takes 100ms or 1ms to count to 10,000. Yes, it's a bad habit to get into because it won't scale (yada yada yada) but please don't misrepresent what I said quite clearly.

    Do you want to call the Counting rCTE "Slow Set Based"? 😉

    I am right now placing a large counting rCTE in a nuclear-powered pork chop launcher, targeted at your present location.

  • SQL Kiwi (5/23/2012)


    since there is no looping 'at the T-SQL level'; it's a single statement

    It's a single statement in appearance only. It's executed more than once which is the very definition of recursion.

    And, yes... you gave a definition. You said that something that processes one row at a time could still be set based because a set can contain 1 item.

    The truth is, there's really no such thing as Set Based in computers because, at the machine language level, everything is handled one item at a time. But, if you want to talk about Set Based as it has become known in T-SQL, anything that necessarily iterates over 1 row at a time and cannot be made to iterate of more than 1 row at a time (you can read that as processes 1 row at a time for each execution) isn't set based.

    Hugo's update was set based much like a while loop operating on different levels of a hierarchy is set based. It's capable of handling more than 1 row at a time for each iteration.

    Pro Tip: Reading the existing replies to a thread can help prevent posting duplicate information.

    Pro Tip: Reiterating what has already been said helps drive a point home especially since I'm the author of the article the other person cited. You don't need to resort to such snarky quips, Paul. You've never been like that before... don't start the ring knocking crap now.

    I am right now placing a large counting rCTE in a nuclear-powered pork chop launcher, targeted at your present location

    Can't wait. Should be fun provided you can leave the "Pro Tips" in the box where they belong.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden (5/24/2012)


    ...anything that necessarily iterates over 1 row at a time and cannot be made to iterate of more than 1 row at a time (you can read that as processes 1 row at a time for each execution) isn't set based. Hugo's update was set based much like a while loop operating on different levels of a hierarchy is set based. It's capable of handling more than 1 row at a time for each iteration.

    The following recursive counting CTE is set-based then?

    WITH rCTE AS

    (

    SELECT n = 1

    UNION ALL

    SELECT rCTE.n

    FROM rCTE,

    (

    VALUES

    (1),(1),(1),(1),(1),

    (1),(1),(1),(1),(1),

    (1),(1),(1),(1),(1),

    (1),(1),(1),(1),(1)

    ) AS V (v)

    )

    SELECT TOP (10000)

    ROW_NUMBER() OVER (ORDER BY (SELECT PI()))

    FROM rCTE

    ORDER BY

    ROW_NUMBER() OVER (ORDER BY (SELECT PI()))

    OPTION (MAXRECURSION 0);

    You don't need to resort to such snarky quips, Paul. You've never been like that before... don't start the ring knocking crap now.

    Then do me the courtesy of reading my responses before getting on your anti-rCTE soapbox, and talking to me like I just picked up a SQL manual yesterday. And keep the bad language in the box where it belongs.

  • Jeff Moden (5/24/2012)


    Pro Tip: Reading the existing replies to a thread can help prevent posting duplicate information.

    Regarding this point. You know yourself that you often rock up to a long thread, replying in order one at a time without reading all the way through first. It's great for the post count, of course, but try the following Google search: "site:sqlservercentral.com jeff moden heh wish read whole thread" < I got 1,120 results 😛

Viewing 15 posts - 16 through 30 (of 46 total)

You must be logged in to reply to this topic. Login to reply