Performance tuning of hierarchy based query

Question

Post reply

Performance tuning of hierarchy based query

sqlenthu 89358

SSCrazy

Points: 2843
More actions
May 12, 2019 at 12:31 pm

#3640897

Hi DBAs,

I have a performance issue with one hierarchy processing related query. Here is the scenario:

I have following two temp tables with sample data:

#temp_Mainhier (It is main hierarchy containing summary nodes including top node till leaf nodes)

ID businessID BusinessName Hierarchy

1 1 Top 0

2 10 ABC 0|1

3 11 DEF 0|1

4 12 GHI 0|1

5 13 MNO 0|1|2

6 14 LMN 0|1|3

...

1234 112233 XXX 0|1|2|8|102|10485|234235

#temp_SecondaryHierLeaf (It is subset of secondary hierarchy with only leaf nodes in businessID. It might be the last summary node in secondary hierarchy chain oractual cost center itself)

ID businessID BusinessName Hierarchy

121234 562839 ZSF 0|23|567|15435|28935

123741 374927 JDY 0|28|324|24523|56424

167349 235826 KHS 0|23|342|43225|87628

...

Now the number of leaf nodes will be same in both the hierarchies but hierarchies might be different. In secondary hierarchy leaf nodes (in temp table),the hierarchy code will have some summary node which will be there in main hierarchy. For example,

#temp_Mainhier

ID businessID BusinessName Hierarchy

123456 777777 HJU 0|1|18|342|34232|324323|843634|238457 (Leaf node in std hierarchy)

123456 888888 HJU 0|1|18|342|34232|324323|843634|238457|777777

123456 999999 HJU 0|1|18|342|34232|324323|843634|238457|777777

#temp_SecondaryHierLeaf

ID businessID BusinessName Hierarchy

326522 238457 HJU 0|23|324|3456|34232|324323|843634 (Leaf node in sec hierarchy)

In above example, if, business id 238457 in secondary hierarchy, is present in std hierarchy, then data should be like this(hierarchy of secondary hierarchy for that business id + substring of std hierarchy from where business id in custom hierarchy is started):

ID businessID BusinessName Hierarchy

123456 777777 HJU 0|23|324|3456|34232|324323|843634|238457

123456 888888 HJU 0|23|324|3456|34232|324323|843634|238457|777777

123456 999999 HJU 0|23|324|3456|34232|324323|843634|238457|777777

Currently i am using following code which is taking huge performance hit:

select b.ID, b.businessid, b.businessname, a.hierarchy+'|'+substring(b.hierarchy,charindex(a.businessid+'|', b.hierarchy+'|'), len(b.hierarchy)) as hierarchy

from #temp_SecondaryHierLeaf a

cross join #temp_Mainhier b

where charindex('|'+a.businessid+'|', b.hierarchy) > 0

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

Joie Andrew One Orange Chip Points: 27360 More actions · Answer 1

Joie Andrew

One Orange Chip

Points: 27360

May 12, 2019 at 8:46 pm

#3641095

How many rows in your tables? That cross join is most likely killing you, as it is multiplying every row in the first table with every row in the second table.

Joie Andrew
"Since 1982"

sqlenthu 89358 SSCrazy Points: 2843 More actions · Answer 2

sqlenthu 89358

SSCrazy

Points: 2843

May 13, 2019 at 5:18 am

#3641283

#temp_Mainhier contains around 500000 and #temp_SecondaryHierLeaf contains around 200000 records.

This reply was modified 5 years, 4 months ago by sqlenthu 89358. Reason: Rectified the numbers

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 3

sqlenthu 89358 wrote:

#temp_Mainhier contains around 500000 and #temp_SecondaryHierLeaf contains around 200000 records.

I'm not groking what you need to do but it seems like you're going to have perpetual problems because you have a secondary hierarchy table that's the "same but different" and you're using a positional hierarchy path instead. The following is from your original post and I'm just not seeing how your hierarchy is working because ID 2,3, and 4 all have the same hierarchical path With that, it would appear that you can't actually tell what the ancestors of IDs 5 and 6 actually are by looking ONLY at the hierarchical path alone. Instead, you have to look at both the hierarchical path and the ID. That ID should be the last piped value in the hierarchy for the node. For example, (if I'm reading this right) the path for ID 5 should actually be 0|1|2|5. In other words, the hierarchical path should always contain the current ID as the right most ID in the hierarchical path. Another way to put is is that the right most value in the hierarchical path should always be a leaf node ID and it should be the ID of the current row.

ID businessID BusinessName Hierarchy

1 1 Top 0

2 10 ABC 0|1

3 11 DEF 0|1

4 12 GHI 0|1

5 13 MNO 0|1|2

6 14 LMN 0|1|3

IF that can be done (and it looks pretty easy to do with code) and if we can change the datatype of the hierarchical path column, we'll be able to do some things that will totally amaze you (especially when it comes to "node summaries" as you call them) and will run so fast you won't believe it actually ran. See the following article for more information.

https://qa.sqlservercentral.com/articles/hierarchies-on-steroids-2-a-replacement-for-nested-sets-calculations-1

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

sqlenthu 89358 SSCrazy Points: 2843 More actions · Answer 4

Hey Jeff, there ID column is simply identity column. Its the businessID and Hierarchy column which are related. So IDs 2, 3, 4 have same hierarchies because businessid 10, 11, 12 are kids of businessid 1.

In between, it was funny to read "hierarchies on steroids" (i picturized it :)).

Hope my explanation clarifies and u can provide me steroids for my query.

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 5

Yes, I do understand how your hierarchy is setup and I understand that the ID column is just an integer. What I'm saying is that it has not been included in the hierarchical path column and that is and will continue to make your life miserable. Also, having the hierarchical path as a character based rendition of pipe delimited IDs contained in the upline will continue to cause you severe pain. Hopefully, the hierarchical path that you have isn't actually the text rendition of a column that's actually based on the "HierarchyID" datatype, as well.

If we can change those two things, then some serious and fairly easy magic can happen. We can also do the same without making those changes but it will be a bit more painful. Either way, the results will be incredible.

Please read the article that I previously provided a link for, which describes what I mean by "magic" and "incredible".

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

sqlenthu 89358 SSCrazy Points: 2843 More actions · Answer 6

I will definitely check that link in details. In between, to get optimized results feel free to make changes from these two temp tables to new temp tables in the format u want them to be processed quickly.

"What I'm saying is that it has not been included in the hierarchical path column and that is and will continue to make your life miserable." - The ID column is just an identity column. I wonder how it will improve the performance if it's not at all related to hierarchies. Aren't we adding more bulk to our problem with this ?

"Hopefully, the hierarchical path that you have isn't actually the text rendition of a column that's actually based on the "HierarchyID" datatype, as well." -It is actually. The hierarchy column is indeed made up of businessID values to show hierarchy path of a particular businessId.

Having said that, my intention here is only to get all leaf level BusinessID from secondary hierarchy, which exists in hierarchy column of main hierarchy, with a hierarchy comprising of "hierarchy value of secondary hierarchy" + '|' + "part of hierarchy value which starts from businessId from secondary hierarchy".

This reply was modified 5 years, 4 months ago by sqlenthu 89358. Reason: Updated with more clarification

Sergiy SSC Guru Points: 110192 More actions · Answer 7

sqlenthu 89358 wrote:

Hi DBAs, I have a performance issue with one hierarchy processing related query. Here is the scenario: I have following two temp tables with sample data: #temp_Mainhier (It is main hierarchy containing summary nodes including top node till leaf nodes) ID businessID BusinessName Hierarchy 1 1 Top 0 2 10 ABC 0|1 3 11 DEF 0|1 4 12 GHI 0|1 5 13 MNO 0|1|2 6 14 LMN 0|1|3 ... 1234 112233 XXX 0|1|2|8|102|10485|234235 #temp_SecondaryHierLeaf (It is subset of secondary hierarchy with only leaf nodes in businessID. It might be the last summary node in secondary hierarchy chain oractual cost center itself) ID businessID BusinessName Hierarchy 121234 562839 ZSF 0|23|567|15435|28935 123741 374927 JDY 0|28|324|24523|56424 167349 235826 KHS 0|23|342|43225|87628 ... Now the number of leaf nodes will be same in both the hierarchies but hierarchies might be different. In secondary hierarchy leaf nodes (in temp table),the hierarchy code will have some summary node which will be there in main hierarchy. For example, #temp_Mainhier ID businessID BusinessName Hierarchy 123456 777777 HJU 0|1|18|342|34232|324323|843634|238457 (Leaf node in std hierarchy) 123456 888888 HJU 0|1|18|342|34232|324323|843634|238457|777777 123456 999999 HJU 0|1|18|342|34232|324323|843634|238457|777777 #temp_SecondaryHierLeaf ID businessID BusinessName Hierarchy 326522 238457 HJU 0|23|324|3456|34232|324323|843634 (Leaf node in sec hierarchy)

Performance issues are embedded into your tables design.

There is no "query tuning" solution to this.

Try to read about data normalization and fix the design.

Then querying will be easy and lightening fast.

_____________
Code for TallyGenerator

sqlenthu 89358 SSCrazy Points: 2843 More actions · Answer 8

Sorry guys for late reply as i had to move out of town on an emergency. As of now there is no solution but i am going through the link provided by Jeff and see if the overall design can be changed. Its always a tough task to check the design of a already implemented and settled application, but we have to do what we have to do. Thanking you all once again.