Parsing paired relationships.

  • Well my solution ignores the problem of 1:1 entries, as does Peter's, which runs in 4 seconds and looks to be spot on (aside from the face that it results in a comma delimited list, but I figure it's down to me to get my head around exactly what he's doing and sort that one (or else hack it post processing, which is probably what I'll end up doing), thanks Peter! ;)).

    It's web developer data, which is to say we should expect it to be imperfect, anything with either field null or both fields the same are valid candidates for deletion pre-processing. There are breaks in the chain, remember, so chaining stuff together isn't necessarily going to give the right answer...

    But can I just say I'm really grateful for all this help, it's awesome :w00t:

  • Sorry, I thought the comma separated list was a requirement. If you just want a set of (set_id, product_id) replace

    SELECT DISTINCT

    set_id,

    STUFF((

    SELECT

    ', ' + CAST(Q.product_id AS VARCHAR(10))

    FROM

    (

    SELECT DISTINCT

    PS2.product_id

    FROM

    dbo.ProductSets PS2

    WHERE

    PS2.set_id = PS1.set_id

    ) Q

    FOR XML PATH('')

    ), 1, 2, '') product_set

    FROM

    #ProductSets PS1

    ORDER BY

    set_id

    with

    SELECT DISTINCT

    set_id,

    product_id

    FROM

    #ProductSets

    ORDER BY

    set_id, product_id;

  • richard.gardner 6009 (12/22/2016)


    Hi Jeff,

    Here's a sample dataset, my bad, 89,000 is a different number, it's just 50,000 pairs of internal keys.

    @chris-2 - Thanks for the offer, appreciated, I'll have another look at it, I did run it a couple of times and the first time I got an error >100 levels of recursion, second time it ran until I killed it.

    Regards

    Richard

    Quote. Excellent. Thanks, Richard. Now I have something to very real to work with instead of making up something that might not be quite right.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • I have to group these records [sic: rows are not records] together - the above data creates two groups :

    Group 1 --> ProdA, ProdB, ProdC

    Group 2 --> ProdD, ProdE [\quote]

    Instead of mimicking assembly language pointer chains with recursive CTE's, why do not you implement your own example?

    CREATE TABLE Product_Groups

    (product_group_id CHAR(5) NOT NULL,

    product_id CHAR(5) NOT NULL,

    PRIMARY KEY (product_group_id, product_id));

    These products are at one level of aggregation in your data model, and the groups are at a different level.. What I am not sure about is whether a product has to belong to one and only one group, or can be in several groups (for example, a lot of recipes include the ingredient "water"). The next question is can your groups contain groups?

    Books in Celko Series for Morgan-Kaufmann Publishing
    Analytics and OLAP in SQL
    Data and Databases: Concepts in Practice
    Data, Measurements and Standards in SQL
    SQL for Smarties
    SQL Programming Style
    SQL Puzzles and Answers
    Thinking in Sets
    Trees and Hierarchies in SQL

  • @richard.gardner,

    I'm still playing with the data but let me make a suggestion.

    The data you currently have is the result of someone creating the "opposite direction" node for every node in a nice, clean Adjacency List (Parent/Child table). That original clean list lives SOMEWHERE. I'd take the time to find out where it is because all of this would become the proverbial cake-walk if we could get our hands on that original Adjacency List.

    Because of the bidirectional nature of all the nodes, it takes a comparatively very long time to find the "root" node of any given tree in the data. While it can certainly be accomplished, I have to ask if there's any other column in the table that helps identify either the "root" nodes, or the "leaf" nodes. Perhaps there's a "Level" column or a "leaf node" indication or a "base part number" that might identify the "root" nodes?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • @jeff - It looks to be the result of some sort of matching process, where A was matched to B and B was matched to A... Resulting in 50% redundant rows.

    If I'm correct, you should be able to eliminate the redundant 1/2 of the rows by deleting all rows WHERE B < A.

  • Jason A. Long (12/23/2016)


    @Jeff - It looks to be the result of some sort of matching process, where A was matched to B and B was matched to A... Resulting in 50% redundant rows.

    If I'm correct, you should be able to eliminate the redundant 1/2 of the rows by deleting all rows WHERE B < A.

    Such duplication of edge data is used for a lot of different things. A lot of times they're used for the creation of certain types of web structures such as airline routes or mapping functionality such as MapQesst, etc. It really helps on maps because while you might be able to take the route of A,B,C, you might not be able to get back using C,B,A because of one way streets, etc. Even some wiring diagrams are supported by this type of double entry, although it's really not necessary.

    And, yes, you could delete 1/2 the rows by deleting all rows where B<A... which would also successfully destroy the hierarchy because not all parents will have values greater than their children or vice versa.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Hmmm... looking at the original small list of nodes in the original post, there might just be a way to Ferret out the root nodes. I'll give that a try...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jason A. Long (12/23/2016)


    @Jeff - It looks to be the result of some sort of matching process, where A was matched to B and B was matched to A... Resulting in 50% redundant rows.

    If I'm correct, you should be able to eliminate the redundant 1/2 of the rows by deleting all rows WHERE B < A.

    I wasn't bagging what you were raking. After looking at the data provided in the csv file, it's not a hierarchy at all. It's a "multi-web" where there are multiple islands of data and all points on each island are connected to all points on the same island. Now I get it. Thanks, Jason.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden (12/23/2016)


    Jason A. Long (12/23/2016)


    @Jeff - It looks to be the result of some sort of matching process, where A was matched to B and B was matched to A... Resulting in 50% redundant rows.

    If I'm correct, you should be able to eliminate the redundant 1/2 of the rows by deleting all rows WHERE B < A.

    I wasn't bagging what you were raking. After looking at the data provided in the csv file, it's not a hierarchy at all. It's a "multi-web" where there are multiple islands of data and all points on each island are connected to all points on the same island. Now I get it. Thanks, Jason.

    No worries. There was a fair possibility that I was way off base. I just knew that I'd seen data structured in that format when I was asked to work through some patient matching data a while back.

  • Happy Christmas guys.

    Just for info there is no hierarchy, each item is at the same level, it's a relationship between items which have different PKs but are the same design, just different colours, thus "Islands Of Groups" as Jeff says. We have a Rbar solution anyway from Peter, but I would be interested in how to tackle it from a set based point of view.

    I can't get into the product that controls this list, one of the joys of "software as a service" is the utter blindness of the providers when it comes to ODBC access - "You can simply use CURL requests to get the data one record at a time". Right, let's do that then, I love spending my time building web calls in Powershell, plus all the completeness and integrity stuff that comes with it (that's sarcasm, btw).

    They make me feel like a dinosaur when I argue a once a day call over ODBC to get the complete recordset might be a better idea, like I'm a complete idiot for thinking a record at a time web request is a shitty way to retrieve a dataset, but I have a strong and abiding suspicion that I'm right.

    As the function of the list is to accept a single key and respond with a list of group members I don't think it's particularly dysfunctional, it's just when I'm trying to recreate the groups post processing....

Viewing 11 posts - 16 through 25 (of 25 total)

You must be logged in to reply to this topic. Login to reply