Query Performance

  • Hello All,

    Wish you a very happy new year 2019!!

    I had been asked to review performance of a procedure and I am producing a part of it. There are two similar queries in this procedure and both when combined take upto 12-15 seconds on a big server(DR of production with hardly any resource consumption).
    Since both queries are similar, I am putting one of them here. 

    Even though it is a nested query but even when I tried to evaluate it independently using temp tables for better visualization, there was hardly any performance improvement. I tried to look into the query to see if I can modify it a bit but did not find much success here. 

    Can one of the gurus see if there is a way to re-write this better so that the performance can be improved. I am including the table schema along with indexes and execution plan. Please let me know if I can frame my question in a better way or supply more information.

    Thanks in advance!!

             
    declare @maxResults INT = 25,
    @companyName varchar(100)=N'Northwestern Corp Discount Commercial Paper 4/2 Yrs 3&4',
    @issuerCode nvarchar(6)=''   
          
       
    DECLARE @searchText NVARCHAR(300)   
    DECLARE @issuerCodeType int
    DECLARE @searchTextType int

    SELECT @searchText = dbautil.dbo.stripAutoSuggestCompanyName_fn(@companyName)

    --SELECT @searchText = @companyName

          
    -- Get company by QGrams    
    DECLARE @minLen int    
    DECLARE @Q int    
    SET @Q = 3    
    SET @minLen = 4    
        
    IF LEN(@searchText) < 6    
    BEGIN    
        SET @minLen = 2    
    END    

    Create table #qgram
    (    
        gram NCHAR(3) NOT NULL PRIMARY KEY CLUSTERED
    ,    [count] INT NOT NULL    
    )    
     
    DECLARE @SearchTextLenght INT
    SET @SearchTextLenght = LEN(@searchText)
        
    INSERT INTO #qgram    
    SELECT    
        ISNULL(gram, '') as gram
    ,    ISNULL(COUNT(*), 1) AS [count]    
    FROM    
    (    
        SELECT CONVERT(NCHAR(3), SUBSTRING(SUBSTRING('###', 1, @Q-1) + @searchText + SUBSTRING('%%%', 1, @Q-1), n.n, @Q)) AS gram    
        FROM dbautil.dbo.numbers_tbl n (NOLOCK)    
        WHERE n.n <= @SearchTextLenght + @Q - 2    
    ) sub    
    GROUP BY gram  
    --option(recompile)

        
    DECLARE @maxMatch INT    
    SELECT @maxMatch = SUM(count) FROM #qgram    

        
        
    SELECT sub2.fundOrComp, sub2.nameCacheId, sub2.ratio ,row_number() over(partition by sub2.companyid order by ratio desc) as rn    
    FROM
    (    
    SELECT TOP(@maxResults) 1 as fundOrComp,sub.nameCacheId,
    ((sub.count * 1.0 /(qn.strippedNameLength + @Q - 2)) * (sub.count * 1.0/ @maxMatch)) AS ratio,qn.companyId 
        
           
        FROM    
        (
            SELECT TOP 1500 g.nameCacheId,SUM (CASE WHEN g.count > q.count THEN q.count ELSE g.count END ) AS count             
            FROM #qgram q INNER JOIN SearchSuggest_QGram_tbl g (NOLOCK) ON g.gram = q.gram GROUP BY g.nameCacheId HAVING    
            SUM (CASE WHEN g.count > q.count THEN q.count ELSE g.count END ) >= @minLen ORDER BY 2 DESC, 1 ASC         
        ) sub    
        INNER JOIN SearchSuggest_QGramNameCache_tbl qn (NOLOCK) ON qn.nameCacheId = sub.nameCacheId    
        ORDER BY ratio DESC    
    ) sub2 

     
    drop table #qgram

  • chandan_jha18 - Wednesday, January 2, 2019 6:49 AM

    Hello All,

    Wish you a very happy new year 2019!!

    I had been asked to review performance of a procedure and I am producing a part of it. There are two similar queries in this procedure and both when combined take upto 12-15 seconds on a big server(DR of production with hardly any resource consumption).
    Since both queries are similar, I am putting one of them here. 

    Even though it is a nested query but even when I tried to evaluate it independently using temp tables for better visualization, there was hardly any performance improvement. I tried to look into the query to see if I can modify it a bit but did not find much success here. 

    Can one of the gurus see if there is a way to re-write this better so that the performance can be improved. I am including the table schema along with indexes and execution plan. Please let me know if I can frame my question in a better way or supply more information.

    Thanks in advance!!

             
    declare @maxResults INT = 25,
    @companyName varchar(100)=N'Northwestern Corp Discount Commercial Paper 4/2 Yrs 3&4',
    @issuerCode nvarchar(6)=''   
          
       
    DECLARE @searchText NVARCHAR(300)   
    DECLARE @issuerCodeType int
    DECLARE @searchTextType int

    SELECT @searchText = dbautil.dbo.stripAutoSuggestCompanyName_fn(@companyName)

    --SELECT @searchText = @companyName

          
    -- Get company by QGrams    
    DECLARE @minLen int    
    DECLARE @Q int    
    SET @Q = 3    
    SET @minLen = 4    
        
    IF LEN(@searchText) < 6    
    BEGIN    
        SET @minLen = 2    
    END    

    Create table #qgram
    (    
        gram NCHAR(3) NOT NULL PRIMARY KEY CLUSTERED
    ,    [count] INT NOT NULL    
    )    
     
    DECLARE @SearchTextLenght INT
    SET @SearchTextLenght = LEN(@searchText)
        
    INSERT INTO #qgram    
    SELECT    
        ISNULL(gram, '') as gram
    ,    ISNULL(COUNT(*), 1) AS [count]    
    FROM    
    (    
        SELECT CONVERT(NCHAR(3), SUBSTRING(SUBSTRING('###', 1, @Q-1) + @searchText + SUBSTRING('%%%', 1, @Q-1), n.n, @Q)) AS gram    
        FROM dbautil.dbo.numbers_tbl n (NOLOCK)    
        WHERE n.n <= @SearchTextLenght + @Q - 2    
    ) sub    
    GROUP BY gram  
    --option(recompile)

        
    DECLARE @maxMatch INT    
    SELECT @maxMatch = SUM(count) FROM #qgram    

        
        
    SELECT sub2.fundOrComp, sub2.nameCacheId, sub2.ratio ,row_number() over(partition by sub2.companyid order by ratio desc) as rn    
    FROM
    (    
    SELECT TOP(@maxResults) 1 as fundOrComp,sub.nameCacheId,
    ((sub.count * 1.0 /(qn.strippedNameLength + @Q - 2)) * (sub.count * 1.0/ @maxMatch)) AS ratio,qn.companyId 
        
           
        FROM    
        (
            SELECT TOP 1500 g.nameCacheId,SUM (CASE WHEN g.count > q.count THEN q.count ELSE g.count END ) AS count             
            FROM #qgram q INNER JOIN SearchSuggest_QGram_tbl g (NOLOCK) ON g.gram = q.gram GROUP BY g.nameCacheId HAVING    
            SUM (CASE WHEN g.count > q.count THEN q.count ELSE g.count END ) >= @minLen ORDER BY 2 DESC, 1 ASC         
        ) sub    
        INNER JOIN SearchSuggest_QGramNameCache_tbl qn (NOLOCK) ON qn.nameCacheId = sub.nameCacheId    
        ORDER BY ratio DESC    
    ) sub2 

     
    drop table #qgram

    Start by getting rid of the nolock hints, the implicit conversions and the parallel execution, run the query again and post the execution plan.
    ๐Ÿ˜Ž

  • Cluster the [SearchSuggest_QGram_tbl] on [nameCacheId] rather than on gram.

    SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

  • Eirikur Eiriksson - Wednesday, January 2, 2019 8:21 AM

    chandan_jha18 - Wednesday, January 2, 2019 6:49 AM

    Hello All,

    Wish you a very happy new year 2019!!

    I had been asked to review performance of a procedure and I am producing a part of it. There are two similar queries in this procedure and both when combined take upto 12-15 seconds on a big server(DR of production with hardly any resource consumption).
    Since both queries are similar, I am putting one of them here. 

    Even though it is a nested query but even when I tried to evaluate it independently using temp tables for better visualization, there was hardly any performance improvement. I tried to look into the query to see if I can modify it a bit but did not find much success here. 

    Can one of the gurus see if there is a way to re-write this better so that the performance can be improved. I am including the table schema along with indexes and execution plan. Please let me know if I can frame my question in a better way or supply more information.

    Thanks in advance!!

             
    declare @maxResults INT = 25,
    @companyName varchar(100)=N'Northwestern Corp Discount Commercial Paper 4/2 Yrs 3&4',
    @issuerCode nvarchar(6)=''   
          
       
    DECLARE @searchText NVARCHAR(300)   
    DECLARE @issuerCodeType int
    DECLARE @searchTextType int

    SELECT @searchText = dbautil.dbo.stripAutoSuggestCompanyName_fn(@companyName)

    --SELECT @searchText = @companyName

          
    -- Get company by QGrams    
    DECLARE @minLen int    
    DECLARE @Q int    
    SET @Q = 3    
    SET @minLen = 4    
        
    IF LEN(@searchText) < 6    
    BEGIN    
        SET @minLen = 2    
    END    

    Create table #qgram
    (    
        gram NCHAR(3) NOT NULL PRIMARY KEY CLUSTERED
    ,    [count] INT NOT NULL    
    )    
     
    DECLARE @SearchTextLenght INT
    SET @SearchTextLenght = LEN(@searchText)
        
    INSERT INTO #qgram    
    SELECT    
        ISNULL(gram, '') as gram
    ,    ISNULL(COUNT(*), 1) AS [count]    
    FROM    
    (    
        SELECT CONVERT(NCHAR(3), SUBSTRING(SUBSTRING('###', 1, @Q-1) + @searchText + SUBSTRING('%%%', 1, @Q-1), n.n, @Q)) AS gram    
        FROM dbautil.dbo.numbers_tbl n (NOLOCK)    
        WHERE n.n <= @SearchTextLenght + @Q - 2    
    ) sub    
    GROUP BY gram  
    --option(recompile)

        
    DECLARE @maxMatch INT    
    SELECT @maxMatch = SUM(count) FROM #qgram    

        
        
    SELECT sub2.fundOrComp, sub2.nameCacheId, sub2.ratio ,row_number() over(partition by sub2.companyid order by ratio desc) as rn    
    FROM
    (    
    SELECT TOP(@maxResults) 1 as fundOrComp,sub.nameCacheId,
    ((sub.count * 1.0 /(qn.strippedNameLength + @Q - 2)) * (sub.count * 1.0/ @maxMatch)) AS ratio,qn.companyId 
        
           
        FROM    
        (
            SELECT TOP 1500 g.nameCacheId,SUM (CASE WHEN g.count > q.count THEN q.count ELSE g.count END ) AS count             
            FROM #qgram q INNER JOIN SearchSuggest_QGram_tbl g (NOLOCK) ON g.gram = q.gram GROUP BY g.nameCacheId HAVING    
            SUM (CASE WHEN g.count > q.count THEN q.count ELSE g.count END ) >= @minLen ORDER BY 2 DESC, 1 ASC         
        ) sub    
        INNER JOIN SearchSuggest_QGramNameCache_tbl qn (NOLOCK) ON qn.nameCacheId = sub.nameCacheId    
        ORDER BY ratio DESC    
    ) sub2 

     
    drop table #qgram

    Start by getting rid of the nolock hints, the implicit conversions and the parallel execution, run the query again and post the execution plan.
    ๐Ÿ˜Ž

    Implicit conversions are happening in very small tables. I can try removing nolock hint and add maxdop 1 to remove parallelism if that's what you mean by parallel execution.

  • ScottPletcher - Wednesday, January 2, 2019 8:26 AM

    Cluster the [SearchSuggest_QGram_tbl] on [nameCacheId] rather than on gram.

    Let me get back to you by making this change and report the performance.

    Thanks

  • And, calculations like this are best done into a variable once:
    @SearchTextLenght + @Q - 2

    Having it in the filter criteria leads to problems.

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

  • ScottPletcher - Wednesday, January 2, 2019 8:26 AM

    Cluster the [SearchSuggest_QGram_tbl] on [nameCacheId] rather than on gram.

    While I agree that will cure the problem of doing 1,500 individual seeks by (possibly... no guarantee without knowing more about the data) by changing to a (possibly, again) seek followed by a range scan, it will only solve about 5% or less of the problem.

    The real problem is that the table (apparently) has almost a quarter million rows in it and the query at hand is returning more than 90 million of them in 51 seek/scans.  Apparently (after a quick glance at the code and the schema and the execution plan) there are 51 items in the temp table to be checked and trying to do this in a single query is absolutely the worst way to do this.  Since there is no early disqualification of data, you end up with what amounts to as an accidental Cross Join.

    The design of the tables also has a problem... there are distinctly (pun intended) no unique indexes.

    Just changing one index (or any index, for that matter) just isn't going to solve the performance issues with this query.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • chandan_jha18 - Wednesday, January 2, 2019 6:49 AM

    Hello All,

    Wish you a very happy new year 2019!!

    I had been asked to review performance of a procedure and I am producing a part of it. There are two similar queries in this procedure and both when combined take upto 12-15 seconds on a big server(DR of production with hardly any resource consumption).
    Since both queries are similar, I am putting one of them here. 

    Even though it is a nested query but even when I tried to evaluate it independently using temp tables for better visualization, there was hardly any performance improvement. I tried to look into the query to see if I can modify it a bit but did not find much success here. 

    Can one of the gurus see if there is a way to re-write this better so that the performance can be improved. I am including the table schema along with indexes and execution plan. Please let me know if I can frame my question in a better way or supply more information.

    Thanks in advance!!

             
    declare @maxResults INT = 25,
    @companyName varchar(100)=N'Northwestern Corp Discount Commercial Paper 4/2 Yrs 3&4',
    @issuerCode nvarchar(6)=''   
          
       
    DECLARE @searchText NVARCHAR(300)   
    DECLARE @issuerCodeType int
    DECLARE @searchTextType int

    SELECT @searchText = dbautil.dbo.stripAutoSuggestCompanyName_fn(@companyName)

    --SELECT @searchText = @companyName

          
    -- Get company by QGrams    
    DECLARE @minLen int    
    DECLARE @Q int    
    SET @Q = 3    
    SET @minLen = 4    
        
    IF LEN(@searchText) < 6    
    BEGIN    
        SET @minLen = 2    
    END    

    Create table #qgram
    (    
        gram NCHAR(3) NOT NULL PRIMARY KEY CLUSTERED
    ,    [count] INT NOT NULL    
    )    
     
    DECLARE @SearchTextLenght INT
    SET @SearchTextLenght = LEN(@searchText)
        
    INSERT INTO #qgram    
    SELECT    
        ISNULL(gram, '') as gram
    ,    ISNULL(COUNT(*), 1) AS [count]    
    FROM    
    (    
        SELECT CONVERT(NCHAR(3), SUBSTRING(SUBSTRING('###', 1, @Q-1) + @searchText + SUBSTRING('%%%', 1, @Q-1), n.n, @Q)) AS gram    
        FROM dbautil.dbo.numbers_tbl n (NOLOCK)    
        WHERE n.n <= @SearchTextLenght + @Q - 2    
    ) sub    
    GROUP BY gram  
    --option(recompile)

        
    DECLARE @maxMatch INT    
    SELECT @maxMatch = SUM(count) FROM #qgram    

        
        
    SELECT sub2.fundOrComp, sub2.nameCacheId, sub2.ratio ,row_number() over(partition by sub2.companyid order by ratio desc) as rn    
    FROM
    (    
    SELECT TOP(@maxResults) 1 as fundOrComp,sub.nameCacheId,
    ((sub.count * 1.0 /(qn.strippedNameLength + @Q - 2)) * (sub.count * 1.0/ @maxMatch)) AS ratio,qn.companyId 
        
           
        FROM    
        (
            SELECT TOP 1500 g.nameCacheId,SUM (CASE WHEN g.count > q.count THEN q.count ELSE g.count END ) AS count             
            FROM #qgram q INNER JOIN SearchSuggest_QGram_tbl g (NOLOCK) ON g.gram = q.gram GROUP BY g.nameCacheId HAVING    
            SUM (CASE WHEN g.count > q.count THEN q.count ELSE g.count END ) >= @minLen ORDER BY 2 DESC, 1 ASC         
        ) sub    
        INNER JOIN SearchSuggest_QGramNameCache_tbl qn (NOLOCK) ON qn.nameCacheId = sub.nameCacheId    
        ORDER BY ratio DESC    
    ) sub2 

     
    drop table #qgram

    It would really have helped if you had identified what the actual purpose of the query is.  So, for better understanding by all, let me see if I get it right.

    In this particular bit of code, it would appear that you're retrieving 51 "key items" that uniquely identify a company and then storing that information in a temp table.

    After that, you're trying to figure out which rows in a much larger table provide a match  of those 51 "grams" so you can return only those nameCacheId that are actually a match for the given company.

    Is that correct?  If not, please expound.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden - Wednesday, January 2, 2019 9:52 AM

    chandan_jha18 - Wednesday, January 2, 2019 6:49 AM

    Hello All,

    Wish you a very happy new year 2019!!

    I had been asked to review performance of a procedure and I am producing a part of it. There are two similar queries in this procedure and both when combined take upto 12-15 seconds on a big server(DR of production with hardly any resource consumption).
    Since both queries are similar, I am putting one of them here. 

    Even though it is a nested query but even when I tried to evaluate it independently using temp tables for better visualization, there was hardly any performance improvement. I tried to look into the query to see if I can modify it a bit but did not find much success here. 

    Can one of the gurus see if there is a way to re-write this better so that the performance can be improved. I am including the table schema along with indexes and execution plan. Please let me know if I can frame my question in a better way or supply more information.

    Thanks in advance!!

             
    declare @maxResults INT = 25,
    @companyName varchar(100)=N'Northwestern Corp Discount Commercial Paper 4/2 Yrs 3&4',
    @issuerCode nvarchar(6)=''   
          
       
    DECLARE @searchText NVARCHAR(300)   
    DECLARE @issuerCodeType int
    DECLARE @searchTextType int

    SELECT @searchText = dbautil.dbo.stripAutoSuggestCompanyName_fn(@companyName)

    --SELECT @searchText = @companyName

          
    -- Get company by QGrams    
    DECLARE @minLen int    
    DECLARE @Q int    
    SET @Q = 3    
    SET @minLen = 4    
        
    IF LEN(@searchText) < 6    
    BEGIN    
        SET @minLen = 2    
    END    

    Create table #qgram
    (    
        gram NCHAR(3) NOT NULL PRIMARY KEY CLUSTERED
    ,    [count] INT NOT NULL    
    )    
     
    DECLARE @SearchTextLenght INT
    SET @SearchTextLenght = LEN(@searchText)
        
    INSERT INTO #qgram    
    SELECT    
        ISNULL(gram, '') as gram
    ,    ISNULL(COUNT(*), 1) AS [count]    
    FROM    
    (    
        SELECT CONVERT(NCHAR(3), SUBSTRING(SUBSTRING('###', 1, @Q-1) + @searchText + SUBSTRING('%%%', 1, @Q-1), n.n, @Q)) AS gram    
        FROM dbautil.dbo.numbers_tbl n (NOLOCK)    
        WHERE n.n <= @SearchTextLenght + @Q - 2    
    ) sub    
    GROUP BY gram  
    --option(recompile)

        
    DECLARE @maxMatch INT    
    SELECT @maxMatch = SUM(count) FROM #qgram    

        
        
    SELECT sub2.fundOrComp, sub2.nameCacheId, sub2.ratio ,row_number() over(partition by sub2.companyid order by ratio desc) as rn    
    FROM
    (    
    SELECT TOP(@maxResults) 1 as fundOrComp,sub.nameCacheId,
    ((sub.count * 1.0 /(qn.strippedNameLength + @Q - 2)) * (sub.count * 1.0/ @maxMatch)) AS ratio,qn.companyId 
        
           
        FROM    
        (
            SELECT TOP 1500 g.nameCacheId,SUM (CASE WHEN g.count > q.count THEN q.count ELSE g.count END ) AS count             
            FROM #qgram q INNER JOIN SearchSuggest_QGram_tbl g (NOLOCK) ON g.gram = q.gram GROUP BY g.nameCacheId HAVING    
            SUM (CASE WHEN g.count > q.count THEN q.count ELSE g.count END ) >= @minLen ORDER BY 2 DESC, 1 ASC         
        ) sub    
        INNER JOIN SearchSuggest_QGramNameCache_tbl qn (NOLOCK) ON qn.nameCacheId = sub.nameCacheId    
        ORDER BY ratio DESC    
    ) sub2 

     
    drop table #qgram

    It would really have helped if you had identified what the actual purpose of the query is.  So, for better understanding by all, let me see if I get it right.

    In this particular bit of code, it would appear that you're retrieving 51 "key items" that uniquely identify a company and then storing that information in a temp table.

    After that, you're trying to figure out which rows in a much larger table provide a match of those 51 items so you can return only those rows that are actually a match for the given company.

    Is that correct?  If not, please expound.

    It looks to me like a primitive attempt at fuzzy matching, with that 250 million row table containing n-grams of say company names. It might be more performant (and far less "codey") to implement Alan B's n-gram fuzzy matching. 
    In the meantime, as you point out Jeff, the most costly part of the batch is this bit:
    SELECT TOP(1500)
     g.nameCacheId,
     SUM(CASE WHEN g.count > q.count THEN q.count ELSE g.count END) AS count               
    FROM #qgram q
    INNER JOIN SearchSuggest_QGram_tbl g
     ON g.gram = q.gram
    GROUP BY g.nameCacheId
    HAVING SUM(CASE WHEN g.count > q.count THEN q.count ELSE g.count END) >= @minLen
    ORDER BY 2 DESC, 1 ASC    

             
    For a quick & temporary small improvement I'd try something like this:
    CREATE INDEX ix_JustTesting ON SearchSuggest_QGram_tbl (gram, nameCacheId) INCLUDE ([count])

    SELECT TOP(1500)
     x.nameCacheId,
     x.count                
    FROM #qgram q
    CROSS APPLY (
     SELECT
      g.gram, 
      g.nameCacheId,
      SUM(CASE WHEN g.count > q.count THEN q.count ELSE g.count END) AS count
     FROM SearchSuggest_QGram_tbl g
     WHERE g.gram = q.gram
     GROUP BY g.gram, g.nameCacheId
    ) x
    WHERE x.count  >= @minLen
    ORDER BY 2 DESC      

    But it's never going to perform as well as a redesign.

    โ€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.โ€ - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Jeff Moden - Wednesday, January 2, 2019 9:46 AM

    ScottPletcher - Wednesday, January 2, 2019 8:26 AM

    Cluster the [SearchSuggest_QGram_tbl] on [nameCacheId] rather than on gram.

    While I agree that will cure the problem of doing 1,500 individual seeks by (possibly... no guarantee without knowing more about the data) by changing to a (possibly, again) seek followed by a range scan, it will only solve about 5% or less of the problem.

    The real problem is that the table (apparently) has almost a quarter million rows in it and the query at hand is returning more than 90 million of them in 51 seek/scans.  Apparently (after a quick glance at the code and the schema and the execution plan) there are 51 items in the temp table to be checked and trying to do this in a single query is absolutely the worst way to do this.  Since there is no early disqualification of data, you end up with what amounts to as an accidental Cross Join.

    The design of the tables also has a problem... there are distinctly (pun intended) no unique indexes.

    Just changing one index (or any index, for that matter) just isn't going to solve the performance issues with this query.

    Quite right.  I didn't look as fully at the query as I should have (I'm just too busy at work right now, but I look at SQL qs as kind of a quick break from work).

    SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

  • The Op hasn't gotten back on my questions yet.  If the problem definition turns out to be what I what I posted in my questions, then it will be what I've done on similar things in the past (one being just 2 weeks ago) and we should be able to have the code complete in about a quarter of a second.  Maybe less.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden - Wednesday, January 2, 2019 4:39 PM

    The Op hasn't gotten back on my questions yet.  If the problem definition turns out to be what I what I posted in my questions, then it will be what I've done on similar things in the past (one being just 2 weeks ago) and we should be able to have the code complete in about a quarter of a second.  Maybe less.

    Sir, I apologize for delay in response which is mainly due to our different timezones and I could have got a chance to talk to the dev today only. I have got the document explaining the logic and will post the information shortly.

  • ScottPletcher - Wednesday, January 2, 2019 8:26 AM

    Cluster the [SearchSuggest_QGram_tbl] on [nameCacheId] rather than on gram.

    Sir, No change in performance even after this.

    Thanks

  • Jeff Moden - Wednesday, January 2, 2019 9:52 AM

    chandan_jha18 - Wednesday, January 2, 2019 6:49 AM

    Hello All,

    Wish you a very happy new year 2019!!

    I had been asked to review performance of a procedure and I am producing a part of it. There are two similar queries in this procedure and both when combined take upto 12-15 seconds on a big server(DR of production with hardly any resource consumption).
    Since both queries are similar, I am putting one of them here. 

    Even though it is a nested query but even when I tried to evaluate it independently using temp tables for better visualization, there was hardly any performance improvement. I tried to look into the query to see if I can modify it a bit but did not find much success here. 

    Can one of the gurus see if there is a way to re-write this better so that the performance can be improved. I am including the table schema along with indexes and execution plan. Please let me know if I can frame my question in a better way or supply more information.

    Thanks in advance!!

             
    declare @maxResults INT = 25,
    @companyName varchar(100)=N'Northwestern Corp Discount Commercial Paper 4/2 Yrs 3&4',
    @issuerCode nvarchar(6)=''   
          
       
    DECLARE @searchText NVARCHAR(300)   
    DECLARE @issuerCodeType int
    DECLARE @searchTextType int

    SELECT @searchText = dbautil.dbo.stripAutoSuggestCompanyName_fn(@companyName)

    --SELECT @searchText = @companyName

          
    -- Get company by QGrams    
    DECLARE @minLen int    
    DECLARE @Q int    
    SET @Q = 3    
    SET @minLen = 4    
        
    IF LEN(@searchText) < 6    
    BEGIN    
        SET @minLen = 2    
    END    

    Create table #qgram
    (    
        gram NCHAR(3) NOT NULL PRIMARY KEY CLUSTERED
    ,    [count] INT NOT NULL    
    )    
     
    DECLARE @SearchTextLenght INT
    SET @SearchTextLenght = LEN(@searchText)
        
    INSERT INTO #qgram    
    SELECT    
        ISNULL(gram, '') as gram
    ,    ISNULL(COUNT(*), 1) AS [count]    
    FROM    
    (    
        SELECT CONVERT(NCHAR(3), SUBSTRING(SUBSTRING('###', 1, @Q-1) + @searchText + SUBSTRING('%%%', 1, @Q-1), n.n, @Q)) AS gram    
        FROM dbautil.dbo.numbers_tbl n (NOLOCK)    
        WHERE n.n <= @SearchTextLenght + @Q - 2    
    ) sub    
    GROUP BY gram  
    --option(recompile)

        
    DECLARE @maxMatch INT    
    SELECT @maxMatch = SUM(count) FROM #qgram    

        
        
    SELECT sub2.fundOrComp, sub2.nameCacheId, sub2.ratio ,row_number() over(partition by sub2.companyid order by ratio desc) as rn    
    FROM
    (    
    SELECT TOP(@maxResults) 1 as fundOrComp,sub.nameCacheId,
    ((sub.count * 1.0 /(qn.strippedNameLength + @Q - 2)) * (sub.count * 1.0/ @maxMatch)) AS ratio,qn.companyId 
        
           
        FROM    
        (
            SELECT TOP 1500 g.nameCacheId,SUM (CASE WHEN g.count > q.count THEN q.count ELSE g.count END ) AS count             
            FROM #qgram q INNER JOIN SearchSuggest_QGram_tbl g (NOLOCK) ON g.gram = q.gram GROUP BY g.nameCacheId HAVING    
            SUM (CASE WHEN g.count > q.count THEN q.count ELSE g.count END ) >= @minLen ORDER BY 2 DESC, 1 ASC         
        ) sub    
        INNER JOIN SearchSuggest_QGramNameCache_tbl qn (NOLOCK) ON qn.nameCacheId = sub.nameCacheId    
        ORDER BY ratio DESC    
    ) sub2 

     
    drop table #qgram

    It would really have helped if you had identified what the actual purpose of the query is.  So, for better understanding by all, let me see if I get it right.

    In this particular bit of code, it would appear that you're retrieving 51 "key items" that uniquely identify a company and then storing that information in a temp table.

    After that, you're trying to figure out which rows in a much larger table provide a match  of those 51 "grams" so you can return only those nameCacheId that are actually a match for the given company.

    Is that correct?  If not, please expound.

    Sir, here is what I got from the developer as per his document:

    1)     When the search text is entered, the length of the search text is calculated and then the search string is broken into multiple small strings of 3 characters and stored in a table called qgram
    Example: If I pass the search text as 'Legg Mason Partners', we will get the following results:

    Results:
    gram     count
    Ma         1
     Pa          1
    ##L         1
    #Le         1

    etc.

  • Count indicates the number oftimes the spilt string is present in the search string.
  • 2)     Qgram table and  SearchSuggest_QGram_tbl are comibined based on gram column.
    3)     After Qgram and SearchSuggest_QGram_tbl are combined, the rows are grouped on basis of same namecacheid and count values are summed up. And top 1500 rows are selected.
    Since we have count column in both the tables, the column with least value is used for adding the all the count values while grouping based on namecacheid.
    Example
    nameCacheId                   count
    5359401                                 9
    512010                                   6
    2545081                                 8
    2875285                                 11             
    1934139                                10
    4)     The result fetched in step 4 is combined with SearchSuggest_QGramNameCache_tbl based on ‘nameCacheId’

    5)     From the result set obtained in step 6, top 500 records are fetched which contain company id, ratio, namecache id, strippednamelength (company or fund name).
    6)     Ratio for each company id is calculated as below
    (count (obtained in step 4) * 0.1 /length(companyname fetched in step7) +1) * count (obtained in step 4) * 0.1/ sum of count values obtained earlier.

    Can you see if this matches your observation.
    Regards
    Chandan Jha

Viewing 15 posts - 1 through 15 (of 28 total)

You must be logged in to reply to this topic. Login to reply