Regarding SQL Query optimization

  • I have a query regarding Query optimization.[font="Arial"][/font]

    Which one of the following is preferred considering the performance of the SQL query executed..??

    Method 1

    --------

    UPDATE Table1 SET Col1 = ( Select Count(Id) from Maintable MT where (MT.Age between 1 and 2) and (MT.PID = @id))

    UPDATE Table1 SET Col2 = ( Select Count(Id) from Maintable MT where (MT.Age between 2 and 3) and (MT.PID = @id))

    ------------------------

    ------------------------

    UPDATE Table1 SET Col10 = ( Select Count(Id) from Maintable MT where (MT.Age between 10 and 11) and (MT.PID = @id))

    Method 2

    --------

    UPDATE Table1 SET Col1 = ( Select Count(Id) from Maintable MT where (MT.Age between 1 and 2) and (MT.PID = @id)),

    SET Col2 = ( Select Count(Id) from Maintable MT where (MT.Age between 2 and 3) and (MT.PID = @id)),

    ------------------------

    ------------------------

    SET Col10 = ( Select Count(Id) from Maintable MT where (MT.Age between 10 and 11) and (MT.PID = @id))

  • The second one would be faster becuase it is single SET based update on the table compared to 10 different update statements.

    The syntax in the second query seems to be incorrect.

    thanks
    sarat ๐Ÿ™‚
    Curious about SQL

  • You could try running a test example using the estimated execution plan button and the percentage taken by each method. I can tell you that the single UPDATE affecting several columns took me 36% of the cost of the total batch from a simple test I made while the first method took me the remaining 64%.

    Best regards,

    Best regards,

    Andre Guerreiro Neto

    Database Analyst
    http://www.softplan.com.br
    MCITPx1/MCTSx2/MCSE/MCSA

  • codebyo (10/13/2011)


    You could try running a test example using the estimated execution plan button and the percentage taken by each method. I can tell you that the single UPDATE affecting several columns took me 36% of the cost of the total batch from a simple test I made while the first method took me the remaining 64%.

    Best regards,

    Even with the actual plan, this is completely unreliable. You need both CPU and io and the best way to do that is with actual timings or better still the profiler. If you are unsure of the veracity of this claim, I'll post up a couple of queries - one takes 4%, the other 96% of the total cost when they are run in the same batch. The 4% query however takes 3 times longer to run than the 96% cost query. They're both string-splitters. The 4% query uses a rCTE as a source of rows/numbers which is heavy on io.

    โ€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.โ€ - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • I'll post up a couple of queries

    It would be beneficial for everyone here. Thanks!

  • Chris, whats on your mmind..Which Method is better..first or second..??

  • ChrisM@Work(10/13/2011)


    Even with the actual plan, this is completely unreliable. You need both CPU and io and the best way to do that is with actual timings or better still the profiler. If you are unsure of the veracity of this claim, I'll post up a couple of queries - one takes 4%, the other 96% of the total cost when they are run in the same batch. The 4% query however takes 3 times longer to run than the 96% cost query. They're both string-splitters. The 4% query uses a rCTE as a source of rows/numbers which is heavy on io.

    Oh, yes. I'm aware of that. Thanks for pointing that out. ๐Ÿ˜‰

    I was not saying that would be the best solution though.

    It was just a quick test and in this particular UPDATE example it worked as expected because both of my tests were using the same columns and exactly the same code except for the single UPDATE vs several UPDATEs.

    But I should have stated that more clearly. Thank you for the correction. ๐Ÿ™‚

    Best regards,

    Best regards,

    Andre Guerreiro Neto

    Database Analyst
    http://www.softplan.com.br
    MCITPx1/MCTSx2/MCSE/MCSA

  • No problem. This relies heavily on the testing framework which Jeff Moden et al set up for his excellent string splitter article[/url].

    Here's the rCTE code:

    CREATE FUNCTION [dbo].[DS8K_rCTEa](@pString [varchar](8000), @pDelimiter [char](1))

    RETURNS TABLE WITH SCHEMABINDING AS

    RETURN

    -- DS8K_rCTEa

    WITH x AS (

    SELECT

    ItemNumber = 1,

    pos = 0,

    Nextpos = CHARINDEX(',',@pString,1)

    UNION ALL

    SELECT

    ItemNumber = x.ItemNumber + 1,

    pos = x.nextpos,

    nextpos = ISNULL(NULLIF(CHARINDEX(',',@pString,x.nextpos+1),0),8000)

    FROM x

    WHERE x.nextpos < 8000 AND x.nextpos+x.pos>0

    )

    SELECT

    ItemNumber,

    Item = CASE WHEN pos+nextpos = 0 THEN @pString ELSE SUBSTRING(@pString,pos+1,Nextpos-(pos+1)) END

    FROM x

    Here's a (hard) tally table version:

    ALTER FUNCTION [dbo].[DS8K_Tally1U](@pString [varchar](8000), @pDelimiter [char](1))

    RETURNS TABLE WITH SCHEMABINDING AS

    RETURN

    -- DS8K_Tally1U (one-based permanent tally table)

    SELECT

    ItemNumber = CAST(0 AS BIGINT),

    Item = LEFT(@pString,ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,1),0)-1,8000))

    UNION ALL

    SELECT

    ItemNumber = ROW_NUMBER() OVER(ORDER BY pos),

    Item = SUBSTRING(@pString,pos,[len])

    FROM (

    SELECT

    pos = n+1,

    [len] = ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,(n+1)),0)-(n+1),8000)

    FROM dbo.Tally1 WITH (NOLOCK)

    WHERE n <= ISNULL(DATALENGTH(@pString),0)

    AND SUBSTRING(@pString,n,1) = @pDelimiter

    ) x

    Here's the code to test the two functions in sequence:

    USE tempdb

    GO

    SELECT * into JBMTest

    FROM (

    SELECT 0 as SomeID, NULL as SomeValue UNION ALL --1 NULL

    SELECT 1, SPACE(0) UNION ALL --1 b (Empty String)

    SELECT 2, SPACE(1) UNION ALL --1 b (1 space)

    SELECT 3, SPACE(5) UNION ALL --1 b (5 spaces)

    SELECT 4, ',' UNION ALL --2 b b (both are empty strings)

    SELECT 5, '55555' UNION ALL --1 E

    SELECT 6, ',55555' UNION ALL --2 b E

    SELECT 7, ',55555,' UNION ALL --3 b E b

    SELECT 8, '55555,' UNION ALL --2 b B

    SELECT 9, '55555,1' UNION ALL --2 E E

    SELECT 10, '1,55555' UNION ALL --2 E E

    SELECT 11, '55555,4444,333,22,1' UNION ALL --5 E E E E E

    SELECT 12, '55555,4444,,333,22,1' UNION ALL --6 E E b E E E

    SELECT 13, ',55555,4444,,333,22,1,' UNION ALL --8 b E E b E E E b

    SELECT 14, ',55555,4444,,,333,22,1,' UNION ALL --9 b E E b b E E E b

    SELECT 15, ' 4444,55555 ' UNION ALL --2 E (w/Leading Space) E (w/Trailing Space)

    SELECT 16, 'This,is,a,test.' --E E E E

    ) d

    SELECT test.SomeID, test.SomeValue, split.ItemNumber, Item = QUOTENAME(split.Item,'"')

    FROM JBMTest test CROSS APPLY dbo.DS8K_rCTEa(test.SomeValue,',') split

    SELECT test.SomeID, test.SomeValue, split.ItemNumber, Item = QUOTENAME(split.Item,'"')

    FROM JBMTest test CROSS APPLY dbo.DS8K_Tally1U(test.SomeValue,',') split

    Try this too, it's fun but not quite legit:

    CREATE FUNCTION [dbo].[DS8K_CTE2U](@pString [varchar](8000),@pDelimiter [char](1))

    RETURNS TABLE WITH SCHEMABINDING AS

    RETURN

    -- DS8K_CTE2U (zero-based row-constructor tally table)

    -- When a CROSS JOIN is used without the ON clause, parentheses can be used to indicate the join order -

    -- at least that's the "official" line ;)

    SELECT

    ItemNumber = CAST(0 AS BIGINT),

    Item = CAST(LEFT(@pString, ISNULL(NULLIF(CHARINDEX(@pDelimiter, @pString, 1),0)-1,8000)) AS VARCHAR(8000))

    UNION ALL

    SELECT

    ItemNumber = ROW_NUMBER() OVER(ORDER BY (SELECT NULL)), -- row_number appears to be necessary

    Item = SUBSTRING(@pString,n,ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,n),0)-n,8000))

    FROM (

    SELECT n = n+1

    FROM (

    SELECT TOP (1+ISNULL(DATALENGTH(@pString),0))

    n = (n1 + n2 + n3 + n4)

    FROM (VALUES (0),(100),(200),(300),(400),(500),(600),(700),(800),(900)) t3 (n3)

    CROSS JOIN (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) t1 (n1)

    CROSS JOIN (VALUES (0),(10),(20),(30),(40),(50),(60),(70),(80),(90)) t2 (n2)

    CROSS JOIN (VALUES (0),(1000),(2000),(3000),(4000),(5000),(6000),(7000),(8000),(9000)) t4 (n4) --

    ) n

    WHERE SUBSTRING(@pString,n,1) = @pDelimiter

    ) d

    Now, to test the duration of these two queries, your best bet is Jeff's test setup which is an attachment at the end of the article. This query yields a good summary of the results:

    SELECT

    SplitterName,

    Min_ = MIN(Duration),

    Max_ = MAX(Duration),

    Sum_ = SUM(Duration),

    Avg_ = AVG(Duration)

    FROM (

    SELECT *, Placing = ROW_NUMBER() OVER(PARTITION BY NumberOfElements, MaxElementLength ORDER BY Duration)

    FROM dbo.TestResults -- 392 rows

    ) d

    GROUP BY SplitterName

    ORDER BY SUM(Duration)

    โ€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.โ€ - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Junglee_George (10/13/2011)


    Chris, whats on your mmind..Which Method is better..first or second..??

    Easy - test them ๐Ÿ˜‰

    It's an unrestricted update though, one pass is likely to be better than two or more.

    โ€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.โ€ - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Junglee_George (10/13/2011)


    Which one of the following is preferred considering the performance of the SQL query executed..??

    Actually, neither one is preferred. The first one is worst--as has already been mentioned--because it performs 10 separate updates as opposed to the 1 update for the second, but both of them are bad, because they both perform 10 separate selects from MainTable. Your query can be rewritten to only perform one select as well.

    ;

    WITH MT AS (

    SELECT

    Col1 = Count(CASE WHEN Age BETWEEN 1 and 2 THEN [ID] END)

    , Col2 = Count(CASE WHEN Age BETWEEN 2 AND 3 THEN [ID] END)

    , Col3 = Count(CASE WHEN Age BETWEEN 3 AND 4 THEN [ID] END)

    FROM MainTable

    WHERE PID = 1

    )

    UPDATE Table1

    SET Col1 = MT.Col1

    , Col2 = MT.Col2

    , Col3 = MT.Col3

    FROM MT

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Thank You all for the responses...

    That provides lots of insight...!!!

    Thanks a lot...

  • Actually , difference between 2 or 3 update statements depend on workload exist there and almost ,it could be such big deal for performance

    But actually , I totally agree with the coding algorithm of Drew particularly more for big workload + huge data entity where it could be quite significant performance .

    But as more simple + fast query , you could run the below:

    DECLARE @COL1 INT , @COL2 INT,@COL3 INT

    SELECT @COL1 = ISNULL( Count(CASE WHEN AGE BETWEEN 1 and 2 AND PID =1 THEN 1 ELSE NULL END) ,0)FROM TABLE1

    SELECT @COL2= ISNULL( Count(CASE WHEN AGe BETWEEN 2 AND 3 AND PID =1 THEN 1 ELSE NULL END),0)FROM TABLE1

    SELECT @COL3= ISNULL(Count(CASE WHEN AGe BETWEEN 3 AND 4 AND PID =1 THEN 1 ELSE NULL END),0)FROM TABLE1

    update TABLE1 set col1 =@COL1

    , COL2=@COL2

    , COL3=@COL3

    Where more specific parameters filtration of each count scope would be there thereby , it could benefit within huge data entity if you created 3 filtered nonclustered indexes according to the criteria above

    [font="Arial Black"]Performance Guard[/font]

    [font="Courier New"]Just ask me to get 0 sec for all queries[/font]

  • Junglee_George (10/13/2011)


    I have a query regarding Query optimization.[font="Arial"][/font]

    Which one of the following is preferred considering the performance of the SQL query executed..??

    Method 1

    --------

    UPDATE Table1 SET Col1 = ( Select Count(Id) from Maintable MT where (MT.Age between 1 and 2) and (MT.PID = @id))

    UPDATE Table1 SET Col2 = ( Select Count(Id) from Maintable MT where (MT.Age between 2 and 3) and (MT.PID = @id))

    ------------------------

    ------------------------

    UPDATE Table1 SET Col10 = ( Select Count(Id) from Maintable MT where (MT.Age between 10 and 11) and (MT.PID = @id))

    Method 2

    --------

    UPDATE Table1 SET Col1 = ( Select Count(Id) from Maintable MT where (MT.Age between 1 and 2) and (MT.PID = @id)),

    SET Col2 = ( Select Count(Id) from Maintable MT where (MT.Age between 2 and 3) and (MT.PID = @id)),

    ------------------------

    ------------------------

    SET Col10 = ( Select Count(Id) from Maintable MT where (MT.Age between 10 and 11) and (MT.PID = @id))

    UPDATE Table1 SET Col1 = ( Select Count(Id) from Maintable MT where (MT.Age between 1 and 2) and (MT.PID = @id))

    UPDATE Table1 SET Col2 = ( Select Count(Id) from Maintable MT where (MT.Age between 2 and 3) and (MT.PID = @id))

    ...

    You do realize that MT.Age when it is 2 it will be counted in both Col1 and Col2. The same is true for the other columns where the upper bound of one is equal to the lower bound of the next.

    Same is true with method 2 as well.

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply