Getting data into an adjacency list

  • Hi all,

    I have a table that is in the format of an adjacency list:

    CREATE TABLE t (

      id int not null ,

      parent int null references t (id) default 0 ,

      -- other columns

    )

    I have recently received data that I need to insert into the table, but it's not in the same format.  It looks like the sample below:

    ID   level1      level2                    level3

    1    ITEM 1   Customer inquiry     Referred to Sales

    2    ITEM 1   Customer inquiry     Referred to Order

    3    ITEM 1   Customer inquiry     Referred to TS

    4    ITEM 2   Customer inquiry     Referred to Sales

    5    ITEM 2   Customer inquiry     Referred to Order

    6    ITEM 2   Customer inquiry     Referred to TS

    7    ITEM 3   Documentation       Errors

    8    ITEM 3   Documentation       Organization

    9    ITEM 3   Documentation       Usability

    10  ITEM 3   Documentation        Readme

    11  ITEM 3   Documentation        Online Help

    There is quite a bit of data so I'm looking to write a script.  I'm now looking for some direction on how to do this.  My initial, naive thought was that I would just have to insert the level2 column as new rows with the id as the parent, but that would create a whole lot of redundancy.  I'm not exactly sure how to select out what I need to insert.

    I appreciate any help or advice.

    Thanks.

  • In relation, you are trying to store a tree.  The question is what size tree is it?  Fixed length?

     

    There are 3 main ways I know of to store a tree, and obviously each of those have benefits and drawbacks.

     

    First, ObjectToObjectParent...get huge if you have a lot of relation.  Performance dictates that if you care that a member has a certain parent (anywhere above) you would denormalize.  I personally don't like this solution much unless small scale.

     

    Secondly, you have the ability to use keypath notation.  I believe SQL Mag published an article on this a while back.  This is a denormalized solution as well in a 'field', not by adding additional rows.

     

    The third solution that I am aware of is to use nested sets.  Those are a pain to implement and hard to understand.  So, really I have not found a good way to do a 'tree' representation in SQL that doesn't have significant drawbacks.  In most cases, the keypath solution is my favorite as I feel it is the lesser of the two evils.

  • I have to concur with Dougs response here. I have recently tried various methods (the 3 given above) and after exhaustive testing, preferred the keypath solution because of it's inherent transparency and ease of use (for other programmers who may arive after me and who will have to maintain my code!).

    Nested Sets are a pain to understand - moreover, whenever you add or insert a new item into the tree, a potentially large number of sub-nodes also need to be updated. I don't like this solution at all.

    However, I actually combined solution 1 (object to object parent) and 3 (keypath notation) so that I can quickly reason about parentchild relationships and also can select entire subtrees. As a quick note on implementation, store the parent ID in each row (you can use NULL for the root of the tree) and also store a path string (VARCHAR). Once you have these two things, you've got it made.

  • Hi,

    Thanks for your replies.  I appreciate the input on the format, and when the time comes to change the way these tables work and are used, I'll definately be taking your advice.  Unfortunately, right now, I'm stuck with the way the tables are already set up. 

    I guess to simplify things, I need to write a script that gets this data...:

    ID level1 level2 level3

    1 ITEM 1 Customer inquiry Referred to Sales

    2 ITEM 1 Customer inquiry Referred to Order

    3 ITEM 1 Customer inquiry Referred to TS

    4 ITEM 2 Customer inquiry Referred to Sales

    5 ITEM 2 Customer inquiry Referred to Order

    6 ITEM 2 Customer inquiry Referred to TS

    7 ITEM 3 Documentation Errors

    8 ITEM 3 Documentation Organization

    9 ITEM 3 Documentation Usability

    10 ITEM 3 Documentation Readme

    11 ITEM 3 Documentation Online Help

    into this format:

    ID myCol1 parent

    1 ITEM 1 0

    2 ITEM 2 0

    3 ITEM 3 0

    4 Customer inquiry 1

    5 Customer inquiry 2

    6 Documentation 3

    7 Referred to Sales 4

    8 Referred to Order 4

    9 Referred to TS 4

    10 Referred to Sales 5

    11 Referred to Order 5

    12 Referred to TS 5

    13 Errors 6

    14 Organization 6

    15 Usability 6

    16 Readme 6

    17 Online Help 6

    Thanks

  • Try this one:

    CREATE TABLE NewTable (ID INT IDENTITY, MyCol1 VARCHAR(50), Parent INT)

    INSERT INTO NewTable (MyCol1, Parent)

    SELECT DISTINCT Level1 AS MyCol1, 0 AS Parent

    FROM YourTable

    INSERT INTO NewTable (MyCol1, Parent)

    SELECT A.Level2, B.ID

    FROM (SELECT DISTINCT Level1, Level2 FROM YourTable) A INNER JOIN

    NewTable B ON A.Level1 = B.MyCol1

    INSERT INTO NewTable (MyCol1, Parent)

    SELECT A.Level3, B.ID

    FROM (SELECT DISTINCT Level1, Level2, Level3 FROM YourTable) A INNER JOIN

    NewTable B ON A.Level2 = B.MyCol1

    INNER JOIN NewTable C ON B.Parent = C.ID AND C.MyCol1 = A.Level1

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply