Performance tuning of hierarchy based query

  • Hi DBAs,

    I have a performance issue with one hierarchy processing related query. Here is the scenario:

    I have following two temp tables with sample data:

    #temp_Mainhier (It is main hierarchy containing summary nodes including top node till leaf nodes)

    ID businessID BusinessName Hierarchy

    1 1 Top 0

    2 10 ABC 0|1

    3 11 DEF 0|1

    4 12 GHI 0|1

    5 13 MNO 0|1|2

    6 14 LMN 0|1|3

    ...

    1234 112233 XXX 0|1|2|8|102|10485|234235

    #temp_SecondaryHierLeaf (It is subset of secondary hierarchy with only leaf nodes in businessID. It might be the last summary node in secondary hierarchy chain oractual cost center itself)

    ID businessID BusinessName Hierarchy

    121234 562839 ZSF 0|23|567|15435|28935

    123741 374927 JDY 0|28|324|24523|56424

    167349 235826 KHS 0|23|342|43225|87628

    ...

    Now the number of leaf nodes will be same in both the hierarchies but hierarchies might be different. In secondary hierarchy leaf nodes (in temp table),the hierarchy code will have some summary node which will be there in main hierarchy. For example,

    #temp_Mainhier

    ID businessID BusinessName Hierarchy

    123456 777777 HJU 0|1|18|342|34232|324323|843634|238457 (Leaf node in std hierarchy)

    123456 888888 HJU 0|1|18|342|34232|324323|843634|238457|777777

    123456 999999 HJU 0|1|18|342|34232|324323|843634|238457|777777

    #temp_SecondaryHierLeaf

    ID businessID BusinessName Hierarchy

    326522 238457 HJU 0|23|324|3456|34232|324323|843634 (Leaf node in sec hierarchy)

    In above example, if, business id 238457 in secondary hierarchy, is present in std hierarchy, then data should be like this(hierarchy of secondary hierarchy for that business id + substring of std hierarchy from where business id in custom hierarchy is started):

    ID businessID BusinessName Hierarchy

    123456 777777 HJU 0|23|324|3456|34232|324323|843634|238457

    123456 888888 HJU 0|23|324|3456|34232|324323|843634|238457|777777

    123456 999999 HJU 0|23|324|3456|34232|324323|843634|238457|777777

    Currently i am using following code which is taking huge performance hit:

    select b.ID, b.businessid, b.businessname, a.hierarchy+'|'+substring(b.hierarchy,charindex(a.businessid+'|', b.hierarchy+'|'), len(b.hierarchy)) as hierarchy

    from #temp_SecondaryHierLeaf a

    cross join #temp_Mainhier b

    where charindex('|'+a.businessid+'|', b.hierarchy) > 0

  • How many rows in your tables? That cross join is most likely killing you, as it is multiplying every row in the first table with every row in the second table.

    Joie Andrew
    "Since 1982"

  • #temp_Mainhier contains around 500000 and #temp_SecondaryHierLeaf contains around 200000 records.

    • This reply was modified 5 years, 4 months ago by  sqlenthu 89358. Reason: Rectified the numbers
  • sqlenthu 89358 wrote:

    #temp_Mainhier contains around 500000 and #temp_SecondaryHierLeaf contains around 200000 records.

    I'm not groking what you need to do but it seems like you're going to have perpetual problems because you have a secondary hierarchy table that's the "same but different" and you're using a positional hierarchy path instead.  The following is from your original post and I'm just not seeing how your hierarchy is working because ID 2,3, and 4 all have the same hierarchical path  With that, it would appear that you can't actually tell what the ancestors of IDs 5 and 6 actually are by looking ONLY at the hierarchical path alone.  Instead, you have to look at both the hierarchical path and the ID.  That ID should be the last piped value in the hierarchy for the node.  For example, (if I'm reading this right) the path for ID 5 should actually be 0|1|2|5.  In other words, the hierarchical path should always contain the current ID as the right most ID in the hierarchical path.  Another way to put is is that the right most value in the hierarchical path should always be a leaf node ID and it should be the ID of the current row.

    ID businessID BusinessName Hierarchy

    1 1 Top 0

    2 10 ABC 0|1

    3 11 DEF 0|1

    4 12 GHI 0|1

    5 13 MNO 0|1|2

    6 14 LMN 0|1|3

    IF that can be done (and it looks pretty easy to do with code) and if we can change the datatype of the hierarchical path column, we'll be able to do some things that will totally amaze you (especially when it comes to "node summaries" as you call them) and will run so fast you won't believe it actually ran.  See the following article for more information.

    https://qa.sqlservercentral.com/articles/hierarchies-on-steroids-2-a-replacement-for-nested-sets-calculations-1

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Hey Jeff, there ID column is simply identity column. Its the businessID and Hierarchy column which are related. So IDs 2, 3, 4 have same hierarchies because businessid 10, 11, 12 are kids of  businessid 1.

    In between, it was funny to read "hierarchies on steroids" (i picturized it :)).

    Hope my explanation clarifies and u can provide me steroids for my query.

  • Yes, I do understand how your hierarchy is setup and I understand that the ID column is just an integer.  What I'm saying is that it has not been included in the hierarchical path column and that is and will continue to make your life miserable.  Also, having the hierarchical path as a character based rendition of pipe delimited IDs contained in the upline will continue to cause you severe pain.  Hopefully, the hierarchical path that you have isn't actually the text rendition of a column that's actually based on the "HierarchyID" datatype, as well.

    If we can change those two things, then some serious and fairly easy magic can happen.  We can also do the same without making those changes but it will be a bit more painful.  Either way, the results will be incredible.

    Please read the article that I previously provided a link for, which describes what I mean by "magic" and "incredible".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • I will definitely check that link in details. In between, to get optimized results feel free to make changes from these two temp tables to new temp tables in the format u want them to be processed quickly.

    "What I'm saying is that it has not been included in the hierarchical path column and that is and will continue to make your life miserable." - The ID column is just an identity column. I wonder how it will improve the performance if it's not at all related to hierarchies. Aren't we adding more bulk to our problem with this ?

    "Hopefully, the hierarchical path that you have isn't actually the text rendition of a column that's actually based on the "HierarchyID" datatype, as well." -It is actually. The hierarchy column is indeed made up of businessID values to show hierarchy path of a particular businessId.

    Having said that, my intention here is only to  get all leaf level BusinessID from secondary hierarchy, which exists in hierarchy column of main hierarchy, with a hierarchy comprising of "hierarchy value of secondary hierarchy" + '|' + "part of hierarchy value which starts from businessId from secondary hierarchy".

    • This reply was modified 5 years, 4 months ago by  sqlenthu 89358. Reason: Updated with more clarification
  • sqlenthu 89358 wrote:

    Hi DBAs, I have a performance issue with one hierarchy processing related query. Here is the scenario: I have following two temp tables with sample data: #temp_Mainhier (It is main hierarchy containing summary nodes including top node till leaf nodes) ID businessID BusinessName Hierarchy 1 1 Top 0 2 10 ABC 0|1 3 11 DEF 0|1 4 12 GHI 0|1 5 13 MNO 0|1|2 6 14 LMN 0|1|3 ... 1234 112233 XXX 0|1|2|8|102|10485|234235 #temp_SecondaryHierLeaf (It is subset of secondary hierarchy with only leaf nodes in businessID. It might be the last summary node in secondary hierarchy chain oractual cost center itself) ID businessID BusinessName Hierarchy 121234 562839 ZSF 0|23|567|15435|28935 123741 374927 JDY 0|28|324|24523|56424 167349 235826 KHS 0|23|342|43225|87628 ... Now the number of leaf nodes will be same in both the hierarchies but hierarchies might be different. In secondary hierarchy leaf nodes (in temp table),the hierarchy code will have some summary node which will be there in main hierarchy. For example, #temp_Mainhier ID businessID BusinessName Hierarchy 123456 777777 HJU 0|1|18|342|34232|324323|843634|238457 (Leaf node in std hierarchy) 123456 888888 HJU 0|1|18|342|34232|324323|843634|238457|777777 123456 999999 HJU 0|1|18|342|34232|324323|843634|238457|777777 #temp_SecondaryHierLeaf ID businessID BusinessName Hierarchy 326522 238457 HJU 0|23|324|3456|34232|324323|843634 (Leaf node in sec hierarchy)

     

    Performance issues are embedded into your tables design.

    There is no "query tuning" solution to this.

    Try to read about data normalization and fix the design.

    Then querying will be easy and lightening fast.

    _____________
    Code for TallyGenerator

  • Sorry guys for late reply as i had to move out of town on an emergency. As of now there is no solution but i am going through the link provided by Jeff and see if the overall design can be changed. Its always a tough task to check the design of a already implemented and settled application, but we have to do what we have to do. Thanking you all once again.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply