Hierarchies on Steroids #1: Convert an Adjacency List to Nested Sets

  • LordHz (3/14/2013)


    Jeff,

    Not putting that back in the original table (mines not employee), I'm just maintaining the mapping in your hierarchy table. My thought is to make your stored procedure more generic, so I can simply pass in the table name, Id/ParentId and the sort field(s) and use dynamic SQL to create a the sorted mapping table. That would then update the rgt/lft extends (your bowers) back into the original table. That way the less knowledgeable developers could implement without having to modify the stored procedure. At least that's my thought, haven't implemented it yet. Most of our hierarchies are in the under 100k row range, most under 10k. We do a lot of hierarchies, product, geo, object, person, etc. This update has provided some great brain food.:-D

    Thanks again,

    Matthew

    Hi Matthew,

    Sorry I missed this post. That's some great feedback and a great idea. If you get the chance, please post back how your suggestion worked out for you. Shoot... it's such a good idea, you might even want to write an article about it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • L' Eomot Inversรฉ (8/24/2013)


    Hi Jeff, great article.

    I have one quibble: terminology; even an old relic like me knows that "best bower" and "small bower" have been superseded by "starboard bower" and "port bower" but I've never before heard or seen "right bower" and "left bower". ๐Ÿ˜€

    BWAA-HAAA!!!! I always knew you were a salty old dog! ๐Ÿ˜€ Thanks for the feedback, Tom.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • this is my stab at nested set. i generate on every set of the table.

    i am looking to refactor and trim it down at some point.

    if this is any help to anyone enjoy

    or anyone wants to refactor it a bit more i would appreciate it

    CREATE TABLE [dbo].[Organisation](

    [OrganisationId] [int] IDENTITY(1,1) primary key NOT NULL,

    [ParentId] [int] NULL,

    [MinNest] [int] NULL,

    [MaxNest] [int] NULL,

    [Name] [varchar](50) NULL

    ) ON [PRIMARY]

    GO

    /****** Object: StoredProcedure [dbo].[UpdateOrgHierarchy] Script Date: 11/11/2013 14:17:12 ******/

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    ALTER PROCEDURE [dbo].[UpdateOrgHierarchy]

    AS

    BEGIN

    begin transaction

    SELECT top 1 *

    into #tmp

    from dbo.Organisation with (TABLOCKX)

    ;WITH orgCTE2 AS (

    SELECT OrganisationId,

    OrganisationId as CurrentOrgId,

    1 as Total

    FROM dbo.Organisation e

    where ParentId IS NULL

    UNION ALL

    SELECT parent.OrganisationId,

    e.OrganisationId as CurrentOrgId,

    1 as Total

    FROM dbo.Organisation e

    INNER JOIN orgCTE2 parent ON parent.CurrentOrgId = e.ParentId

    UNION ALL

    SELECT e.OrganisationId,

    e.OrganisationId as CurrentOrgId,

    1 as Total

    FROM dbo.Organisation e

    INNER JOIN orgCTE2 parent ON parent.CurrentOrgId = e.ParentId

    ) , orgCTE3 AS (

    select OrganisationId,

    count(*) as [count]

    from (select distinct OrganisationId, CurrentOrgId from orgCTE2) a

    where Organisationid != CurrentOrgId

    group by OrganisationId

    ) , orgCTE AS (

    SELECT OrganisationId as TopOrgId,

    OrganisationId as OrganisationId,

    ParentId as ParentId,

    1 as depth

    FROM dbo.Organisation

    UNION ALL

    SELECT child.TopOrgId,

    e.OrganisationId as OrganisationId,

    e.ParentId,

    depth + 1 as depth

    FROM dbo.Organisation e

    INNER JOIN orgCTE child ON child.parentId = e.OrganisationId

    ), orgCTE4 AS (

    SELECT OrganisationId,

    ParentId,

    cast(null as INT) as parentOrganisationId,

    OrganisationId as superparentOrganisationId,

    ',' + cast(OrganisationId as varchar(max)) + ',' as idstack,

    1 as depth

    FROM dbo.Organisation

    WHERE parentid IS NULL

    UNION ALL

    SELECT e.OrganisationId,

    e.ParentId,

    parent.organisationid as parentOrganisationId,

    superparentOrganisationId,

    idstack + cast(e.OrganisationId as varchar(max)) + ',' as idstack,

    depth + 1 as depth

    FROM dbo.Organisation e

    INNER JOIN orgCTE4 parent ON parent.organisationid = e.parentId

    )

    UPDATE org

    set minNest = ((row_num-1) * 2) + 1 - (d-1),

    maxNest = ((row_num-1) * 2) + 1 - (d-1)+(cnn*2)-1

    from dbo.Organisation org

    join (select c.*, e.maxdepth - c.depth + 1 as dep, isnull(par.[count] + 1,1) cnn, st.depth as d,

    row_number() OVER (PARTITION BY 1 ORDER BY idstack) as row_num

    from orgCTE c

    join orgCTE4 st

    on c.OrganisationId = st.OrganisationId

    LEFT JOIN orgCTE3 par

    on c.OrganisationId = par.OrganisationId

    JOIN (

    select max(depth) maxdepth, toporgid

    from orgCTE c

    group by toporgid ) e

    on c.toporgid = e.toporgid

    where c.depth = 1) ln

    on org.OrganisationId = ln.OrganisationId

    COMMIT transaction

    END

  • excellent article. But i got a question. I hav a tree structure which i want to store in a DB. The structure is not fixed and can change, ie, nodes can b added, deleted or renamed.

    Using your article, i find that i can navigate and extract data from such a structure very effeciently. But how do i deal with inserting and deleting the nodes into the DB table?

    I am using this from a C# application and my DB is MySQL.

    My tree structure concerns a document archive that is full of old documents that are old enough to get damaged if accessed repeatedly and so need to be scanned and put into the DB. The arrangement of the documents follow a tree structure i.e.:

    Archive root>>Room no>>Cabin no>>Drawer no>>category

    the first 3 levels i.e. room, cabin and drawer remain constant but the category can have many sub levels. Files can b stored directly under category level as well as under any sub level of category, and so i am using a tree structure to store this location info. What i am planning to do is to create the tree with the first 3 levels and then allow the client to put up further nodes as and when they need. Then they can upload scanned documents to the appropriate places.

    The file upload to any level is OK as far as that level/sub category is already defined and stored properly in the DB and can be accessed properly. To make navigation and search of documents from this tree effective and fast i wanted to first of all speed up the tree navigation itself and thus came upon your article. The navigation using your method will be fast even if there are too many sub-levels and files, but how do I manage the insert and delete of the various sub-levels when using your method? Do I need to calculate the Left and Right bower every time a node is inserted or deleted?

    I want to do this right from the beginning because i know the amount of data they have is humongous. The DB will slow down because of this and thus the app i am making would slow down too...

  • @dinson,

    If done correctly, there won't be any slowdown in the DB. The example I used was a million row example and it all worked in under a minute.

    The key is to establish the rules early. For example, can a document exist under more than one category? To be honest, I'd make category (or categories) an attribute (or attributes) of the document rather than storing categories in a hierarchy of locations. In all honesty, I can see two hierarchies here... one for location and one for categorization. Document IDs shouldn't be stored as a subset of categories. I'd have a document table and a bridging table that contains document IDs and the IDs of the categories that a document belonged to and there would be nothing hierarchical about that.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • thnx jeff.

    OK, that makes sense. So, i leave category out of the document table.

    I will try to set the bower calculations as a trigger on insert/update...

    Does that sound ok?

  • Can I use these method to resolve the problem below? Please help

    My problem here is how to identity and select a patients service date. A service date could be admission date(hospital), discharge date(hospital), CBC taken date, CBC result date, xray taken date, xray result read date etc.. . Each date is paired and are reconciled thru a similar account number. lets say an admission date is always paired with discharge date and so both dates will have similar account number, so is CBC taken date paired with CBC result date. This might happen in the same day but the time will be different. However we are receiving the data in a format where we don't know what kind of service was performed, we only have the service date. so we are assuming that a patients earliest date is an admission date and whatever date that has a similar account number is the discharge date. I want to select all patients record that contains admission date and discharge date base on the assumption that the earliest date is an admission date and the other date on that accountnumber is the discharge date. whatever dates that fall in between the admission date and discharge date will be ignored. However a patient could be admitted and discharge from the hospital multiple times a year. so the earliest date after the first discharge date will also be considered as a 2nd admission date.. so on and so forth the cycle continues..

    example data..

    memberid, accountnum, servicedate

    1 , ABC11, 10/24/2013

    1 , ABC12, 10/26/2013

    1 , ABC13, 10/28/2013

    1 , ABC12, 10/30/2013

    1 , ABC13, 11/2/2013

    1 , ABC11, 11/5/2013

    1 , ABC14, 11/30/2013

    1 , ABC15, 12/1/2013

    1 , ABC16, 12/3/2013

    1 , ABC17, 12/8/2013

    1 , ABC17, 12/10/2013

    1 , ABC16, 12/9/2013

    1 , ABC15, 12/12/2013

    1 , ABC14, 12/11/2013

    in these sample data the expected data to be return is:

    memberid, accountnumber, servicedate

    1, ABC11, 10/24/2013

    1, ABC11, 11/5/2013

    1, ABC14, 11/30/2013

    1, ABC14, 12/11/2013

    please help me . is this type of logic complex and be better implemented as a SQL CLR or recursive cte or tally table.. I could only manage to recover the first admission date and discharge date for each member...

    ;with cte1 as (

    select memberid, acctnum, servicedate,

    member_earliestdate = ROW_NUMBER()over(partition by memberid order by servicedate asc)

    from patient_admitlog)

    select * from

    (select * from cte1 where member_earliestdate = 1)t1 join

    cte1 on t1.acctnum = cte1. acctnum and t1.memberid = cte1.memberid

  • @Jeztagab,

    You're problem doesn't appear to be hierarchical in nature. For that reason, I'd prefer to not dilute the discussion on this thread to something off topic. Would you repost your question on one of the T-SQL related forums on this site, please. I'm sure a ton of people would be happy to help especially if you take the time to read the first link in my signature line below.

    Thanks.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff,

    This was timely as we are working on a tree just like this. I'm curious what you think about replacing the SortPath with a hierarchyid data type?

    Glenn

  • gorr-688214 (9/19/2014)


    Jeff,

    This was timely as we are working on a tree just like this. I'm curious what you think about replacing the SortPath with a hierarchyid data type?

    Glenn

    Hi Glenn,

    Thanks for the feedback.

    I apologize for using the term but "It Depends". I've not tried using a HierarchyID for a SortPath column and so I don't know if it would sort correctly for what I needed to sort for. I also know it won't parse quite as easily as what I needed to split the column for.

    If your ultimate goal is to use the HierarchyID instead of Nested Sets and don't require the same functionality from the column that I did, then the code will easily build your HierarchyID column and you can skip most of the rest of the code in the article. In fact, if you lookup HierarchyID in "Books Online", the method I used is the virtually identical as the example there except for the data-type of the column.

    The reason why I don't use the HierarchyID data-type is simply because there doesn't appear to be a clear cut method to determine what the maximum number of nodes the data-type can hold on such large tables and I don't want to find out the hard way on production code. I know the SortPath that I used will hold 2,000 levels and that can be increased quite a bit (more than 500 million levels) by changing it to VARBINARY(MAX) with some penalty for performance.

    If you do go with the HierarchyID, I'd still recommend rebuilding from an Adjacency List when there's a change because it's to troubleshoot problems in an Adjacency List (more natural for most humans) than it is for a positional HierarchyID. That's just my opinion, though.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Remember trying this a while back. updates took a long time. also correcting data was difficult. Prefer an adjacency list with a lineage column method

  • sarat_reddy (9/19/2014)


    Remember trying this a while back. updates took a long time. also correcting data was difficult. Prefer an adjacency list with a lineage column method

    That's why the article suggests using all 3 hierarchy types. The Adjacency List makes it easy to maintain. The Lineage Column provides the right kind of sortability and other performance features. The Nested Sets provide extreme performance for most lookups and aggregations. And it all takes less than a minute to update a million node hierarchy because it doesn't use a RBAR push-stack to rebuild any of it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • I'm doing my first adjacency list to nested sets conversion and building out a rather complex hierarchy table for a new project (something I've been anxious to do since I saw your presentation). I've re-read this article a couple times this week. It's all going very well. Thanks again for this fantastic article and technique Jeff!

    Also...

    -- The ISNULLs make NOT NULL columns

    I just caught that and learned something new tonight. How very cool.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Alan.B (12/16/2015)


    I'm doing my first adjacency list to nested sets conversion and building out a rather complex hierarchy table for a new project (something I've been anxious to do since I saw your presentation). I've re-read this article a couple times this week. It's all going very well. Thanks again for this fantastic article and technique Jeff!

    Also...

    -- The ISNULLs make NOT NULL columns

    I just caught that and learned something new tonight. How very cool.

    Really cool feedback, Alan. Thank you so much! If you need help on something there, you know where I can be found. ๐Ÿ™‚ I'll be really interested to hear how your project panned out.

    Heh... and don'cha just love some of the comments I put in my code. :w00t:

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Hi Jeff,
    Thank you very much for this brilliant article which brings me some light on how to deal with nodes and edges. However in my situation 2 nodes can be merged into a single node. Then this latter can  be split and each new branche will have its own "life". Of course one of the child branch can be merge again with another one.
    In my case, a typical scenario will be some goods which are loaded on board of vessel, then those goods arrived at destination and are stored in different warehouses. Finally a customer come and take different type of goods from different warehouses and put them all in a truck.
    How will you manage that with Nested Sets?
    Thank you very much.
    Sylvain

Viewing 15 posts - 61 through 75 (of 96 total)

You must be logged in to reply to this topic. Login to reply