Bundle/Group data elements linked in M-M table

  • Hi experts,

    I have issue trying to group data into exhaustive bundles i.e. 1 group/bundle per "linking chain" between 2 attributes ( A and C) in a M-M relation table.

    I have a table AC containing two attributes A and C which have a many-many relationship. What I want is to group all A's and C's that have relations into a bundle (new column stating BundleID is preferred). For example row 1 has A1 and C1, row 2 A1 and C2, row 3 A2 and C2 they are to be "bundled" getting BundleID 1 on each row.

    Business logic is that we have either simple relations or complex relation. In a simple relation a C points only on one A and a A can have many C's linked to it. The complex relation is where as written above where A's have many C's and those C's can point to more then one A's.

    See example code below.

    CREATE TABLE [dbo].[AC](

    [A] [nvarchar](10) NULL,

    [C] [nvarchar](10) NULL

    ) ON [PRIMARY]

    INSERT INTO [AC] ([A], [C]) VALUES (A1, C1)

    INSERT INTO [AC] ([A], [C]) VALUES (A1, C2)

    INSERT INTO [AC] ([A], [C]) VALUES (A2, C2)

    INSERT INTO [AC] ([A], [C]) VALUES (A3, C3)

    INSERT INTO [AC] ([A], [C]) VALUES (A4, C4)

    INSERT INTO [AC] ([A], [C]) VALUES (A4, C5)

    INSERT INTO [AC] ([A], [C]) VALUES (A2, C6)

    INSERT INTO [AC] ([A], [C]) VALUES (A5, C6)

    INSERT INTO [AC] ([A], [C]) VALUES (A6, C7)

    INSERT INTO [AC] ([A], [C]) VALUES (A7, C7)

    Given above example values a query should result in the following

    BID A C

    1 A1 C1

    1 A1 C2

    1 A2 C2

    2 A3 C3

    3 A4 C4

    3 A4 C5

    1 A2 C6

    1 A5 C6

    4 A6 C7

    4 A7 C7

    My approach has been to try to divide the problem into solving straight relations and complex relations. Below query gives me all straight relations (have not created a BundleID but in this easy case I can set an unique BID per A) .

    Select *

    From dbo.AC AC1

    WHERE NOT EXISTS

    (

    Select * -- Get all A's who are linked to a C that points to more then one A

    From dbo.AC AC2

    Where AC2.A = AC1.A

    AND AC2.C in

    (

    Select CD.C -- Get all C's that point to more then one A

    from AC CD

    Group By CD.C

    Having count(CD.C) > 1

    )

    )

    Result:

    A3 C3

    A4 C4

    A4 C5

    Is it a good approach and how can I then solve the complex relations?

    Or could the issue be solved in single statement caring for both scenarios?

    Thanks in advance for any suggestions or solutions!

    Regards,

    Kaj

  • Do us a favor? When posting your DDL statements, remember to put quotes around string values that are to be inserted. It makes it easier on us when we don't have to correct simple stuff like that before we test.

    Also, be careful of using business-babble like "exhaustive bundles" when posting to this forum as it makes your post more difficult to understand. I had to re-read your post several times to understand what you're trying to do and many people may not make the effort.

    kajf (10/20/2010)


    Business logic is that we have either simple relations or complex relation. In a simple relation a C points only on one A and a A can have many C's linked to it. The complex relation is where as written above where A's have many C's and those C's can point to more then one A's.

    I sort of get this, but I guess the question is, what are you doing with the simple and complex relationship? Are they going on a report? Are you throwing them into stored procedures? What is your goal in manipulating this data using two different methods?

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Kaj,

    Here's why no one's responded yet. Your question breaks down at this point.

    kajf (10/20/2010)


    I have a table AC containing two attributes A and C which have a many-many relationship. What I want is to group all A's and C's that have relations into a bundle (new column stating BundleID is preferred). For example row 1 has A1 and C1, row 2 A1 and C2, row 3 A2 and C2 they are to be "bundled" getting BundleID 1 on each row.

    Given above example values a query should result in the following

    ...

    BID A C

    1 A1 C1

    1 A1 C2

    1 A2 C2

    2 A3 C3

    3 A4 C4

    3 A4 C5

    1 A2 C6

    1 A5 C6

    4 A6 C7

    4 A7 C7

    No one understands why and how A1 & A2 get bundled together. I get bundling together A1/C1 and A1/C2 or bundling together A1/C2 and A2/C2 because each of those "sets" have a common denominator. But to bundle all three of them together, makes no sense because we don't understand all the nuances behind your business requirements.

    Please explain how the bundling process works and give us more than one example so we can compare and find the pattern. That will help us help you better.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • kajf (10/20/2010)


    Below query gives me all straight relations (have not created a BundleID but in this easy case I can set an unique BID per A) .

    ...

    Result:

    A3 C3

    A4 C4

    A4 C5

    Is it a good approach...

    If I'm reading the data and what you want to do for the simple relationships correctly, your code for simple has dropped a record. It should also be showing A1 C1, shouldn't it?

    Try to avoid complex subqueries like the one you've done above. Code everything as simply as possible. There are two options here. You can either populate a temporary table with any Cs that have more than one occurance and left join on that to pull out the As that don't have one of the duplicate Cs, or you can use a CTE and left outer join on that to pull out the As that don't have one of the duplicate Cs.

    Your WHERE clause would look something like this:

    WHERE (temp or CTE).C IS NULL

    If I'm wrong about your data, please let me know. Because obviously I'm missing something.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Thank you Brandie for taking time for my issue. I'll try to be more precise now and onward!

    ********Edit********

    Brandie posted answer at the same time as I was writing response.

    Corrections below to answer his questions.

    *******************

    What I'm trying to accomplish is to group all A's and C's that have an relation in the M-M table and create a bundle ID and writing the bundleID on each row. All A's and C's that can be "chained" together (A1<->C2, C2<->A2, so forth and so on) are to get the same bundle ID.

    Brandie Tarvin (10/21/2010)

    No one understands why and how A1 & A2 get bundled together. I get bundling together A1/C1 and A1/C2 or bundling together A1/C2 and A2/C2 because each of those "sets" have a common denominator. But to bundle all three of them together, makes no sense because we don't understand all the nuances behind your business requirements.

    The example is a car pool where we have cars C's and customer's A's. In this car pool a car can be used by several customers and a customer can use several cars. Each car has a cost and each customer pays a fee. I need to group all cars and customers into bundles to calculate cost and fees for the group. (and NO, we do not have any CarPoolID which one would thought. It's actually what I'm trying to reengineer).

    Important note is that we always start from the customer A to calculated get a BundleID i.e. Two customer's (eg A6, A7 ) and one car C (C7) would give one BundleID and one customer (A4) and two cars (C4,C5) would give one BundleID.

    The solution is to be part of a SP and value add is that we have added a new column with the bundle ID. The result is then to used in a calculation to sum all distinct A's and C's sum values (ASum & CSum) per bundleID.

    Brandie Tarvin (10/21/2010)

    What is your goal in manipulating this data using two different methods?

    The method is to be the same but I thought it would be easier to divide it into two steps; simple and complex. Simple case is for example when an A is "pointed" at by C's that do not point to any other A (in the code below for example (A4<->C4, A4<->C5). In other words one or more cars (C's) is used by only one customer (A). The approach can be disregarded as I'm seeking a solution to manage the whole data set, regardless of data in it.

    New code below (have tried to follow the forum etiquette on coding standard)

    Table #AC is source table

    Table #ACBundle is table where result is to be stored

    Brandie Tarvin (10/21/2010)

    If I'm wrong about your data, please let me know. Because obviously I'm missing something.

    Truly sorry for bad data. Is now corrected.

    --===== If the test tables already exists, drop it

    IF OBJECT_ID('TempDB..#AC','U') IS NOT NULL

    DROP TABLE #AC

    IF OBJECT_ID('TempDB..#ACBundle','U') IS NOT NULL

    DROP TABLE #ACBundle

    --===== Create the test tables with

    CREATE TABLE #AC -- No primary key exist in table

    (

    A [nvarchar](10) NULL,

    ASum int NULL,

    C [nvarchar](10) NULL,

    CSum int NULL

    )

    CREATE TABLE #ACBundle

    -- No primary key exist in table

    (

    BID int NOT NULL,

    A [nvarchar](10) NULL,

    ASum int NULL,

    C [nvarchar](10) NULL,

    CSum int NULL

    )

    INSERT INTO #AC (A, ASum, C, CSum) VALUES ('A1','2', 'C1','2')

    INSERT INTO #AC (A, ASum, C, CSum) VALUES ('A1','2', 'C2','1')

    INSERT INTO #AC (A, ASum, C, CSum) VALUES ('A2','4', 'C2','1')

    INSERT INTO #AC (A, ASum, C, CSum) VALUES ('A3','7', 'C3','7')

    INSERT INTO #AC (A, ASum, C, CSum) VALUES ('A4','6', 'C4','2')

    INSERT INTO #AC (A, ASum, C, CSum) VALUES ('A4','6', 'C5','3')

    INSERT INTO #AC (A, ASum, C, CSum) VALUES ('A2','4', 'C6','5')

    INSERT INTO #AC (A, ASum, C, CSum) VALUES ('A5','7', 'C6','5')

    INSERT INTO #AC (A, ASum, C, CSum) VALUES ('A6','8', 'C7','4')

    INSERT INTO #AC (A, ASum, C, CSum) VALUES ('A7','5', 'C7','4')

    Expected result if running "SELECT * FROM #ACBundle"

    BID A ASum C CSum

    1 A1 2 C1 2

    1 A1 2 C2 1

    1 A2 4 C2 1

    2 A3 7 C3 7

    3 A4 6 C4 2

    3 A4 6 C5 3

    1 A2 4 C6 5

    1 A5 7 C6 5

    4 A6 8 C7 4

    4 A7 5 C7 4

    Result will later in the SP be used to sum all distinct A's ASum and C's CSum values per BundleID eg.

    BID ASum CSum As Cs

    1 13 8 A1,A2,A5 C1,C2,C6

    2 7 7 A3 C3

    3 6 5 A4 C4,C5

    4 13 4 A6,A7 C7

    Thanks in advance for any suggestions or solutions!

    Best regards,

    Kaj Froelich

  • Kaj,

    Did you understand my comments about how to recode for the simple solution?

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Hi Brandie,

    I think I understand the temp table solution. I'm not familiar with CTE and therefor a NO for this part.

    Using the temp table solution however does not the trick (the way I'm doing it).

    Add below code with my previous reply (not posting twice to save space)

    IF OBJECT_ID('TempDB..#AC_Temp','U') IS NOT NULL

    DROP TABLE #AC_Temp'

    Select C

    INTO #AC_Temp

    From #AC

    Group By C

    Having count(C) > 1

    Select * from #AC_Temp

    --C

    --C2

    --C6

    --C7

    All C's occuring more then once i.e. pointing at more then one A = correct result

    Following your instructions I applied an outer join (of course my way might be wrong). For simplicity I have not include the WHERE #AC_Temp.C IS NULL statement to show why I think it's wrong

    SELECT *

    FROM #AC left outer join #AC_Temp

    on #AC.C = #AC_Temp.C

    --A ASUM C CSUM C(#AC_Temp)

    --A1 2 C1 2 NULL

    --A1 2 C2 1 C2

    --A2 4 C2 1 C2

    --A3 7 C3 7 NULL

    --A4 6 C4 2 NULL

    --A4 6 C5 3 NULL

    --A2 4 C6 5 C6

    --A5 7 C6 5 C6

    --A6 8 C7 4 C7

    --A7 5 C7 4 C7

    With the left outer join we would with the IS NULL statement get A1, A3, A4.

    However this is incorrect as the wanted result for getting the "simple" relations would be only A3, A4.

    Reason being that even though car (C1) is only linked to one customer (A1), the same customer uses another car (C2) and is therefor not considered as "simple".

    As written before this might be the wrong approach (divide into "simple" and "complex"). Actually, looking into the code written in my initial post I found the error which you pointed out; that record holding A1 & C1 is "dropped" and it's similar to the issue above when using left outer join (but that we include it instead).

    Hope above satisfy your question. I apologize for not be to the point all the time but hope that I have been able to convey what i have and what I want to be done.

    How should I move forward?

    Best regards,

    Kaj Froelich

  • This should get you close to what you are looking for. Essentially it takes the intersection (A) of two sets (B) and (C) and adds the values to the sets so that set (B) = set(C). This requires a loop to handle indirect connections like those between A1 and A5 via A2 in your sample data.

    Once the set is complete, you can use a correlated subquery to pull out all A's in a set and all C's in a set.

    I didn't include the Sums, because there were duplicate values in your data, but it should be simple to include those.

    DECLARE @AC TABLE -- No primary key exist in table

    (

    A [nvarchar](10) NULL,

    ASum int NULL,

    C [nvarchar](10) NULL,

    CSum int NULL

    )

    DECLARE @ACBundle TABLE

    -- No primary key exist in table

    (

    BID int NOT NULL,

    A [nvarchar](10) NULL,

    ASum int NULL,

    C [nvarchar](10) NULL,

    CSum int NULL

    )

    INSERT INTO @AC (A, ASum, C, CSum) VALUES ('A1','2', 'C1','2')

    INSERT INTO @AC (A, ASum, C, CSum) VALUES ('A1','2', 'C2','1')

    INSERT INTO @AC (A, ASum, C, CSum) VALUES ('A2','4', 'C2','1')

    INSERT INTO @AC (A, ASum, C, CSum) VALUES ('A3','7', 'C3','7')

    INSERT INTO @AC (A, ASum, C, CSum) VALUES ('A4','6', 'C4','2')

    INSERT INTO @AC (A, ASum, C, CSum) VALUES ('A4','6', 'C5','3')

    INSERT INTO @AC (A, ASum, C, CSum) VALUES ('A2','4', 'C6','5')

    INSERT INTO @AC (A, ASum, C, CSum) VALUES ('A5','7', 'C6','5')

    INSERT INTO @AC (A, ASum, C, CSum) VALUES ('A6','8', 'C7','4')

    INSERT INTO @AC (A, ASum, C, CSum) VALUES ('A7','5', 'C7','4')

    WHILE @@ROWCOUNT > 0

    INSERT @AC(a,c)

    SELECT DISTINCT b.a, c.c

    FROM @AC AS a

    INNER JOIN @AC AS b

    ON a.c = b.c

    INNER JOIN @AC AS c

    ON a.a = c.a

    EXCEPT

    SELECT a, c

    FROM @AC

    SELECT DISTINCT a, c

    , Stuff(( SELECT DISTINCT ', ' + a FROM @AC AS sub WHERE sub.c = main.c ORDER BY ', ' + a FOR XML PATH('') ), 1, 2, '') AS [As]

    , Stuff(( SELECT DISTINCT ', ' + c FROM @AC AS sub WHERE sub.a = main.a ORDER BY ', ' + c FOR XML PATH('') ), 1, 2, '') AS Cs

    FROM @AC AS main

    ORDER BY [As], [Cs], a, c

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • kajf (10/21/2010)


    Following your instructions I applied an outer join (of course my way might be wrong). For simplicity I have not include the WHERE #AC_Temp.C IS NULL statement to show why I think it's wrong

    SELECT *

    FROM #AC left outer join #AC_Temp

    on #AC.C = #AC_Temp.C

    --A ASUM C CSUM C(#AC_Temp)

    --A1 2 C1 2 NULL

    --A1 2 C2 1 C2

    --A2 4 C2 1 C2

    --A3 7 C3 7 NULL

    --A4 6 C4 2 NULL

    --A4 6 C5 3 NULL

    --A2 4 C6 5 C6

    --A5 7 C6 5 C6

    --A6 8 C7 4 C7

    --A7 5 C7 4 C7

    It's not wrong. That's the point.

    SELECT *

    FROM #AC

    left outer join #AC_Temp

    on #AC.C = #AC_Temp.C

    WHERE #AC_Temp.C IS NULL

    This gives you all the singular values (simple) like you wanted. It doesn't do the complex part, which I haven't gotten to yet. Only the simple part where there is a distinct Car that doesn't repeat between customers (A values).

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • I've slightly modified your code for sample data to remove NULLs and add a PK, just for personal preference really. Also I've only dealt with producing the bundles, not any summing of results as I was unclear what was wanted there.

    IF OBJECT_ID('tempdb..#AC') IS NOT NULL DROP TABLE #AC

    CREATE TABLE #AC (

    idx INT IDENTITY (1,1) PRIMARY KEY,

    A NVARCHAR(10) NOT NULL,

    C NVARCHAR(10) NOT NULL)

    INSERT #AC (A,C) VALUES ('A1','C1')

    INSERT #AC (A,C) VALUES ('A1','C2')

    INSERT #AC (A,C) VALUES ('A2','C2')

    INSERT #AC (A,C) VALUES ('A3','C3')

    INSERT #AC (A,C) VALUES ('A4','C4')

    INSERT #AC (A,C) VALUES ('A4','C5')

    INSERT #AC (A,C) VALUES ('A2','C6')

    INSERT #AC (A,C) VALUES ('A5','C6')

    INSERT #AC (A,C) VALUES ('A6','C7')

    INSERT #AC (A,C) VALUES ('A7','C7')

    IF OBJECT_ID('tempdb..#Bundles') IS NOT NULL DROP TABLE #Bundles

    --Create table with a column for the bundle ID

    --Obviously there is no need for C in the primary key but for what we are going to do with it later it is a cheap and cheerful way to set up a clustered index

    CREATE TABLE #Bundles (

    idx INT NOT NULL,

    A NVARCHAR(10) NOT NULL,

    C NVARCHAR(10) NOT NULL,

    BID INT NOT NULL,

    PRIMARY KEY (C, idx))

    --Populate the table. Initially we assign a unique bundle ID to each distinct value of A

    --Note we could equally have used RANK() instead of DENSE_RANK(), or the number part of the value in A if they really are all of the format 'A1' etc.

    INSERT #Bundles

    (idx, A, C, BID)

    SELECT idx, A, C, DENSE_RANK() OVER (ORDER BY A)

    FROM #AC

    --WHILE loop to check if there are any rows with the same C value in different bundles - if there are we will need to combine them into a single bundle

    --No need to check for rows with the same A value in different bundles - these were set up to have the same BID and will always remain part of the same bundle

    WHILE EXISTS

    (

    SELECT *

    FROM #Bundles

    GROUP BY C

    HAVING MAX(BID)<>MIN(BID)

    )

    BEGIN

    --Find cases where a C value is a member of two bundles

    --For each bundle, find the minimum bundle ID that has a C value in common

    --Update bundles to the minimum corresponding bundle ID

    WITH cteFindPairs AS (

    SELECT b2.BID AS BIDhigher, MIN(b1.BID) AS BIDlowestWithCommonC

    FROM #Bundles b1

    INNER JOIN #Bundles b2

    ON b1.C=b2.C

    AND b1.BID<b2.BID

    GROUP BY b2.BID )

    UPDATE #Bundles

    SET BID=BIDlowestWithCommonC

    FROM #Bundles b

    INNER JOIN cteFindPairs c

    ON b.BID=c.BIDhigher

    END

    --At this point we have identified all the bundles.

    --However our sequence of BIDs may now have gaps, and the sample output had no gaps

    --One final update to remove these gaps

    ; WITH cteDenseRankBIDs AS (

    SELECT DISTINCT BID, DENSE_RANK() OVER (ORDER BY BID) AS BIDdenseRank

    FROM #Bundles )

    UPDATE #Bundles

    SET BID=c.BIDdenseRank

    FROM #Bundles b

    INNER JOIN cteDenseRankBIDs c ON b.BID=c.BID

    SELECT BID, A, C

    FROM #Bundles

  • Hi,

    **********

    Edit: Clarification of how I want to summarize bundles

    **********

    First of all thank you all for helping me! Highly appreciated! ! !

    I think that I have been able to create a working solution using your inputs. Have tried different scenarios (read test data) and can't find any loopholes i.e. error in logic that it just happens to give me correct values.

    Using the baseline from Drew below code is what I ended up with.

    Approach:

    1. Copy whole original dataset into working temp table @ACCopy

    2. Create groups by finding indirect connections and insert them into the temporary dataset @ACTemp1

    (Thanks Drew!)

    3. Create unique BID's for bundles in @ACTemp1 and set in temp table #Temp

    4. Create final result set by adding the BID's to original data set (left outer join data from @ACTemp1 & #Temp)

    NOTE: Have added select statements along the way to check the data. I appoligize if you find them anoying.

    DECLARE @ACOrg TABLE -- No primary key exist in table

    (

    A [nvarchar](10) NULL,

    ASum int NULL,

    C [nvarchar](10) NULL,

    CSum int NULL

    )

    DECLARE @ACCopy TABLE -- No primary key exist in table

    (

    A [nvarchar](10) NULL,

    ASum int NULL,

    C [nvarchar](10) NULL,

    CSum int NULL

    )

    DECLARE @ACTemp1 TABLE -- No primary key exist in table

    (

    A [nvarchar](10) NULL,

    C [nvarchar](10) NULL,

    [As] [nvarchar](100) NULL,

    [Cs] [nvarchar](100) NULL

    )

    DECLARE @ACBundle TABLE -- No primary key exist in table

    (

    BID int NOT NULL,

    A [nvarchar](10) NULL,

    ASum int NULL,

    C [nvarchar](10) NULL,

    CSum int NULL

    )

    INSERT INTO @ACOrg (A, ASum, C, CSum) VALUES ('A1','2', 'C1','2')

    INSERT INTO @ACOrg (A, ASum, C, CSum) VALUES ('A1','2', 'C2','1')

    INSERT INTO @ACOrg (A, ASum, C, CSum) VALUES ('A2','4', 'C2','1')

    INSERT INTO @ACOrg (A, ASum, C, CSum) VALUES ('A3','7', 'C3','7')

    INSERT INTO @ACOrg (A, ASum, C, CSum) VALUES ('A4','6', 'C4','2')

    INSERT INTO @ACOrg (A, ASum, C, CSum) VALUES ('A4','6', 'C5','3')

    INSERT INTO @ACOrg (A, ASum, C, CSum) VALUES ('A2','4', 'C6','5')

    INSERT INTO @ACOrg (A, ASum, C, CSum) VALUES ('A5','7', 'C6','5')

    INSERT INTO @ACOrg (A, ASum, C, CSum) VALUES ('A6','8', 'C7','4')

    INSERT INTO @ACOrg (A, ASum, C, CSum) VALUES ('A7','5', 'C7','4')

    INSERT INTO @ACOrg (A, ASum, C, CSum) VALUES ('A5','7', 'C8','8')

    INSERT INTO @ACOrg (A, ASum, C, CSum) VALUES ('A8','8', 'C8','8')

    Select * from @ACOrg --Check data in table

    --Copy whole original dataset into working temp table @ACCopy

    INSERT INTO @ACCopy (A, ASum, C, CSum)

    Select * from @ACOrg

    --Create groups by finding indirect connections and insert them into the temporary dataset @ACTemp1

    --Code by Drew Allen's

    --Find indirect connections and each found occurence into temporary copy table

    WHILE @@ROWCOUNT > 0

    INSERT @ACCopy(a,c)

    SELECT DISTINCT b.a, c.c

    FROM @ACCopy AS a

    INNER JOIN @ACCopy AS b

    ON a.c = b.c

    INNER JOIN @ACCopy AS c

    ON a.a = c.a

    EXCEPT

    SELECT a, c

    FROM @ACCopy

    Select * from @ACCopy --Check data in table

    --Create groups (holding values for all A's and C's in bundle)

    INSERT INTO @ACTemp1 (A, C, [As],[Cs])

    SELECT a, c

    , Stuff(( SELECT DISTINCT ', ' + a FROM @ACCopy AS sub WHERE sub.c = main.c ORDER BY ', ' + a FOR XML PATH('') ), 1, 2, '') AS [As]

    , Stuff(( SELECT DISTINCT ', ' + c FROM @ACCopy AS sub WHERE sub.a = main.a ORDER BY ', ' + c FOR XML PATH('') ), 1, 2, '') AS Cs

    FROM @ACCopy AS main

    ORDER BY [As], [Cs], a, c

    Select * from @ACTemp1 --Check data in table

    --Check if temp table exist

    IF OBJECT_ID('TempDB..#Temp','U') IS NOT NULL

    DROP TABLE #Temp

    --Create unique BID's for bundles in @ACTemp1 and set in temp table #Temp

    SELECT identity(int,1,1) as BID ,T.[As]

    INTO #Temp

    FROM @ACTemp1 as T

    GROUP BY [As]

    SELECT * FROM #Temp --Check data in table

    --Create final result set by adding the BID's to original data set (left outer join data from @ACTemp1 & #Temp)

    INSERT INTO @ACBundle (BID,A ,ASum ,C ,CSum)

    SELECT DISTINCT E.BID,A.a,A.ASum,A.C,A.CSum

    FROM @ACOrg as A LEFT OUTER JOIN @ACTemp1 as D ON A.A = D.A

    LEFT OUTER JOIN #Temp as E ON D.[As] = E.[As]

    SELECT * FROM @ACBundle --Check final result

    As stated before I think it works but there is always things you can do nicer. For example just came to think that I dont need to create a working copy. Just run Drew's code on original table @ACOrg and in the final step write WHERE @ACOrg.A IS NULL to only get original data.

    Drew Allen

    I didn't include the Sums, because there were duplicate values in your data, but it should be simple to include those.

    Don't quite follow you on this. Table is M-M holding values for A and C from their original tables. In the M-M table each row holds both the ID (A and C) and their corresponding sum (ASum & CSum). What am I missing?

    What I'm finaly trying to accomplish with the sum is to sum all A's and C's per BID. Analyzing my new dataset with 12 rows I have 8 A's and 8 C's. Using either Drew's or (Paul's minus 2 rows) code will give us the following result.

    BID 1 (bundle: A1,A2,A5,A8) = 7 rows

    BID 2 (bundle: A3) = 1 row

    BID 3 (bundle: A4) = 2 rows

    BID 4 (bundle: A6,A7) = 2 rows

    Now I need to remove all "duplicate" A's and C's when counting the sum e.g.

    Taking for example BID 3 I would like to get

    Get all distinct A's and sum their value: Distinct = A4 ; Sum = 6 = 6

    Get all distinct C's and sum their value: Distinct = C4,C5 ; Sum = 2+3 = 5

    Taking for example BID 1 I would like to get

    Get all distinct A's and sum their value: Distinct = A1,A2,A5,A8 ; Sum = 2+4+7+8 = 21

    Get all distinct C's and sum their value: Distinct = C1,C2,C6,C8 ; Sum = 2+1+5+8 = 16

    BID ASum CSum As Cs

    1 21 16 A1,A2,A5,A8 C1,C2,C6,C8

    2 7 7 A3 C3

    3 6 5 A4 C4,C5

    4 13 4 A6,A7 C7

    I thank you all very much for all help!

    Best regards,

    Kaj Froelich

  • Error - double posted

  • Error - Double posted

  • Error - Double posted

  • kajf (10/22/2010)


    I thank you both very very much for all help and think this is it!

    However, if you find any error's or suggestions on improvement I'm all ears.

    For example the performance impact of the written queries in data set of 200,000+ records.

    You're very welcome. Glad we could help.

    As far as performance goes, 200,000+ records is all you're dealing with? Pfft. Walk in the park compared to the 11 million row queries I do on my small db. Compared to the VLDBs (terrabytes and billions of records) that other people do, even.

    Make sure your table has a clustered index on the column(s) of your choosing. Then, if you need additional nonclustered indexes, add nonclustered. But indexing is all about what columns you're joining on and what columns you're filtering off of (WHERE clause). So think carefully about your indexes. Don't use too many and don't use too few.

    That is your first step in performance handling.

    Secondly, look at the execution plan for your query. Look for scans and bookmark lookups. If you can change those into seeks, it will help you out. Test your query by rewriting it several different ways and seeing which performs the best.

    The biggest thing is try not to use SELECT *. List out your fields. On such a tiny recordset, you won't notice much (if any) of a performance difference, but once your results get bigger, you will. It's always best practice to avoid the *. SQL has to take a microscopic bit of time to retrieve the columns from each table / table variable each time you do it, something it doesn't have to do when you list everything out.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply