Parsing Data in Column, Returning Unique Values

  • Hello,

    I have this sample data below:


    CREATE TABLE MyTable ([Address] varchar(100), [MD] VARCHAR(100), [MK] VARCHAR(100))
    INSERT INTO MyTable ([Address],[MD], [MK])
    VALUES
    ('https://www.abc123.com/p/1/red-widget','This is a red widget','Red Cup, Bright Red Cup, Dark Red Cup'),
    ('https://www.abc123.com/p/1/small-red-widget','This is a small red widget','Bright Red Cup,Red Cup, Small Dark Red Cup'),
    ('https://www.abc123.com/p/1/big-red-widget','This is a big red widget','Big Red Round Cup, Bright Red Cup, Small Dark Red Cup, Dark Red Cups')

    What I'd like to do is return a single unique data, where it returns column [mk] after comparing each row like listed below:

    MK

    Red Cup
    Bright Red Cup
    Dark Red Cup
    Small Dark Red Cup
    Big Round Red Cup
    Dark Red Cups

  • I made use of the following technique described in stackoverflow.
    https://stackoverflow.com/questions/13873701/convert-comma-separated-column-value-to-rows


    CREATE TABLE MyTable ([Address] varchar(100), [MD] VARCHAR(100), [MK] VARCHAR(100));
    INSERT INTO MyTable ([Address],[MD], [MK])
    VALUES
    ('https://www.abc123.com/p/1/red-widget','This is a red widget','Red Cup, Bright Red Cup, Dark Red Cup'),
    ('https://www.abc123.com/p/1/small-red-widget','This is a small red widget','Bright Red Cup,Red Cup, Small Dark Red Cup'),
    ('https://www.abc123.com/p/1/big-red-widget','This is a big red widget','Big Red Round Cup, Bright Red Cup, Small Dark Red Cup, Dark Red Cups');

    SELECT distinct 
      Split.a.value('.', 'VARCHAR(100)') AS String 
    FROM (SELECT [address], 
       CAST ('<M>' + REPLACE([mk], ',', '</M><M>') + '</M>' AS XML) AS String 
      FROM mytable) AS A CROSS APPLY String.nodes ('/M') AS Split(a);

  • Thank You George. When I ran it across actual table, I get error:

    Msg 9421, Level 16, State 1, Line 1
    XML parsing: line 1, character 30, illegal name character

    Any idea as to error? I also tried changing from varchar  to nvarchar, but same thing

    Also in you're query, the first four rows of data have extra blank space in it, like a hidden character, any way to fix?

    String
     Bright Red Cup
     Dark Red Cup
     Dark Red Cups
     Small Dark Red Cup
    Big Red Round Cup
    Bright Red Cup
    Red Cup

    Again thanks for your help....appreciate it.

  • Do you happen to have any xml unfriendly characters in the column [mk]. Such as &<>.

    These are likely to cause issues. So would need to do a encode and decode equivalent.

    I have done the replacement for the character '&'


    CREATE TABLE MyTable ([Address] varchar(100), [MD] VARCHAR(100), [MK] VARCHAR(100));
    INSERT INTO MyTable ([Address],[MD], [MK])
    VALUES
    ('https://www.abc123.com/p/1/red-widget','This is a red widget','Red Cup, Bright&Red Cup, Dark Red Cup'),
    ('https://www.abc123.com/p/1/small-red-widget','This is a small red widget','Bright Red Cup,Red Cup&, Small Dark Red Cup'),
    ('https://www.abc123.com/p/1/big-red-widget','This is a big red widget','Big Red Round Cup, Bright Red Cup, Small Dark Red Cup, Dark Red Cups');

    SELECT distinct
    replace(Split.a.value('.', 'VARCHAR(100)'),';amp','&') AS String
    FROM (SELECT [address],
     CAST ('<M>' + REPLACE(replace([mk],'&',';amp'), ',', '</M><M>') + '</M>' as XML) AS String
    FROM mytable) AS A CROSS APPLY String.nodes ('/M') AS Split(a);

    As for the extra space not sure how you are getting it. This is what i get when i try using sqlfiddle.
    http://sqlfiddle.com/#!18/7124e/109

  • george-178499 - Wednesday, April 25, 2018 11:06 PM

    Do you happen to have any xml unfriendly characters in the column [mk]. Such as &<>.

    These are likely to cause issues. So would need to do a encode and decode equivalent.

    I have done the replacement for the character '&'


    CREATE TABLE MyTable ([Address] varchar(100), [MD] VARCHAR(100), [MK] VARCHAR(100));
    INSERT INTO MyTable ([Address],[MD], [MK])
    VALUES
    ('https://www.abc123.com/p/1/red-widget','This is a red widget','Red Cup, Bright&Red Cup, Dark Red Cup'),
    ('https://www.abc123.com/p/1/small-red-widget','This is a small red widget','Bright Red Cup,Red Cup&, Small Dark Red Cup'),
    ('https://www.abc123.com/p/1/big-red-widget','This is a big red widget','Big Red Round Cup, Bright Red Cup, Small Dark Red Cup, Dark Red Cups');

    SELECT distinct
    replace(Split.a.value('.', 'VARCHAR(100)'),';amp','&') AS String
    FROM (SELECT [address],
     CAST ('<M>' + REPLACE(replace([mk],'&',';amp'), ',', '</M><M>') + '</M>' as XML) AS String
    FROM mytable) AS A CROSS APPLY String.nodes ('/M') AS Split(a);

    As for the extra space not sure how you are getting it. This is what i get when i try using sqlfiddle.
    http://sqlfiddle.com/#!18/7124e/109

    This worked perfectly. Thanks!

    If I want to add a where clause, to search like where [mk] like '%blue%'  and or order by asc, in you're query how would you do that?

  • Quick suggestion, have a look at these two articles, Tally OH! An Improved SQL 8K โ€œCSV Splitterโ€ Function and Reaping the benefits of the Window functions in T-SQL
    ๐Ÿ˜Ž

  • Using the DelimitedSplit8K function appears to whip the pants off of the other method.   I couldn't get any duration at all from even a datetime2(7) = SYSDATETIME() computation of DATEDIFF.   Here's the code I ran:IF OBJECT_ID(N'tempdb..#MyTable', N'U') IS NOT NULL
        BEGIN
        DROP TABLE #MyTable;
        END;
    GO

    CREATE TABLE #MyTable (
        [Address] varchar(100),
        MD varchar(100),
        MK varchar(100)
    );
    INSERT INTO #MyTable ([Address], MD, MK)
        VALUES    ('https://www.abc123.com/p/1/red-widget','This is a red widget','Red Cup, Bright&Red Cup, Dark Red Cup'),
                ('https://www.abc123.com/p/1/small-red-widget','This is a small red widget','Bright Red Cup,Red Cup&, Small Dark Red Cup'),
                ('https://www.abc123.com/p/1/big-red-widget','This is a big red widget','Big Red Round Cup, Bright Red Cup, Small Dark Red Cup, Dark Red Cups');

    DECLARE @END_DT AS datetime2(7);
    DECLARE @START_DT AS datetime2(7) = SYSDATETIME();

    SELECT DISTINCT
        REPLACE(Split.a.value('.', 'VARCHAR(100)'),';amp','&') AS String
    FROM (
        SELECT [address],
            CAST ('<M>' + REPLACE(REPLACE(MK,'&',';amp'), ',', '</M><M>') + '</M>' AS XML) AS String
        FROM #MyTable
        ) AS A
        CROSS APPLY String.nodes ('/M') AS Split(a);

    SELECT @END_DT = SYSDATETIME();
    PRINT 'XML Method:'
    PRINT '===========';
    PRINT DATEDIFF(ns, @START_DT, @END_DT) / 1000000000.;

    SELECT @START_DT = SYSDATETIME();

    SELECT DISTINCT CASE WHEN LEFT(S.Item, 1) = ' ' THEN STUFF(S.Item, 1, 1, '') ELSE S.Item END AS MK
    FROM #MyTable AS MT
        CROSS APPLY ARIES.dbo.fnAC00DelimitedSplit8K(MT.MK, ',') AS S
    ORDER BY 1;

    SELECT @END_DT = SYSDATETIME();
    PRINT '';
    PRINT 'Delimited Split 8K:';
    PRINT '===================';
    PRINT DATEDIFF(ns, @START_DT, @END_DT) / 1000000000.;

    IF OBJECT_ID(N'tempdb..#MyTable', N'U') IS NOT NULL
        BEGIN
        DROP TABLE #MyTable;
        END;
    GO

    Might be cacheing, so you should probably test in your own environment, with perhaps a DBCC FREEPROCCACHE in between the two runs.  Just don't do that on a production box.

  • VegasL - Wednesday, April 25, 2018 11:24 PM

    george-178499 - Wednesday, April 25, 2018 11:06 PM

    Do you happen to have any xml unfriendly characters in the column [mk]. Such as &<>.

    These are likely to cause issues. So would need to do a encode and decode equivalent.

    I have done the replacement for the character '&'


    CREATE TABLE MyTable ([Address] varchar(100), [MD] VARCHAR(100), [MK] VARCHAR(100));
    INSERT INTO MyTable ([Address],[MD], [MK])
    VALUES
    ('https://www.abc123.com/p/1/red-widget','This is a red widget','Red Cup, Bright&Red Cup, Dark Red Cup'),
    ('https://www.abc123.com/p/1/small-red-widget','This is a small red widget','Bright Red Cup,Red Cup&, Small Dark Red Cup'),
    ('https://www.abc123.com/p/1/big-red-widget','This is a big red widget','Big Red Round Cup, Bright Red Cup, Small Dark Red Cup, Dark Red Cups');

    SELECT distinct
    replace(Split.a.value('.', 'VARCHAR(100)'),';amp','&') AS String
    FROM (SELECT [address],
     CAST ('<M>' + REPLACE(replace([mk],'&',';amp'), ',', '</M><M>') + '</M>' as XML) AS String
    FROM mytable) AS A CROSS APPLY String.nodes ('/M') AS Split(a);

    As for the extra space not sure how you are getting it. This is what i get when i try using sqlfiddle.
    http://sqlfiddle.com/#!18/7124e/109

    This worked perfectly. Thanks!

    If I want to add a where clause, to search like where [mk] like '%blue%'  and or order by asc, in you're query how would you do that?

    Steve, your solution worked perfectly, however I'm having trouble adding a where clause, and you're latest post was way to complicated for me to begin to understand. can i not add a column and where clause like below?


    SELECT distinct
    replace(Split.a.value('.', 'VARCHAR(100)'),';amp','&') AS String, [md] as 'name of column'
    where [md] like '%some value%'
    FROM (SELECT [address],
    CAST ('<M>' + REPLACE(replace([mk],'&',';amp'), ',', '</M><M>') + '</M>' as XML) AS String
    FROM mytable) AS A CROSS APPLY String.nodes ('/M') AS Split(a);

  • VegasL - Friday, April 27, 2018 11:35 AM

    VegasL - Wednesday, April 25, 2018 11:24 PM

    george-178499 - Wednesday, April 25, 2018 11:06 PM

    Do you happen to have any xml unfriendly characters in the column [mk]. Such as &<>.

    These are likely to cause issues. So would need to do a encode and decode equivalent.

    I have done the replacement for the character '&'


    CREATE TABLE MyTable ([Address] varchar(100), [MD] VARCHAR(100), [MK] VARCHAR(100));
    INSERT INTO MyTable ([Address],[MD], [MK])
    VALUES
    ('https://www.abc123.com/p/1/red-widget','This is a red widget','Red Cup, Bright&Red Cup, Dark Red Cup'),
    ('https://www.abc123.com/p/1/small-red-widget','This is a small red widget','Bright Red Cup,Red Cup&, Small Dark Red Cup'),
    ('https://www.abc123.com/p/1/big-red-widget','This is a big red widget','Big Red Round Cup, Bright Red Cup, Small Dark Red Cup, Dark Red Cups');

    SELECT distinct
    replace(Split.a.value('.', 'VARCHAR(100)'),';amp','&') AS String
    FROM (SELECT [address],
     CAST ('<M>' + REPLACE(replace([mk],'&',';amp'), ',', '</M><M>') + '</M>' as XML) AS String
    FROM mytable) AS A CROSS APPLY String.nodes ('/M') AS Split(a);

    As for the extra space not sure how you are getting it. This is what i get when i try using sqlfiddle.
    http://sqlfiddle.com/#!18/7124e/109

    This worked perfectly. Thanks!

    If I want to add a where clause, to search like where [mk] like '%blue%'  and or order by asc, in you're query how would you do that?

    Steve, your solution worked perfectly, however I'm having trouble adding a where clause, and you're latest post was way to complicated for me to begin to understand. can i not add a column and where clause like below?


    SELECT distinct
    replace(Split.a.value('.', 'VARCHAR(100)'),';amp','&') AS String, [md] as 'name of column'
    where [md] like '%some value%'
    FROM (SELECT [address],
    CAST ('<M>' + REPLACE(replace([mk],'&',';amp'), ',', '</M><M>') + '</M>' as XML) AS String
    FROM mytable) AS A CROSS APPLY String.nodes ('/M') AS Split(a);

    If you add a column, you'll possibly mess up the use of DISTINCT.  Also, you threw a WHERE clause into the query right after the SELECT portion, when it HAS to go AFTER a FROM clause (in its entirety).  You could certainly exclude records with certain MD column values from consideration by the overall query, but I wouldn't waste time trying to figure out the XML-based query for WHERE clause purposes.   Here's a modification to my DelimitedSplit8K version:SELECT DISTINCT CASE WHEN LEFT(S.Item, 1) = ' ' THEN STUFF(S.Item, 1, 1, '') ELSE S.Item END AS MK
    FROM #MyTable AS MT
      CROSS APPLY ARIES.dbo.fnAC00DelimitedSplit8K(MT.MK, ',') AS S
    WHERE MT.MD LIKE '%some value%'
    ORDER BY 1;

  • VegasL - Wednesday, April 25, 2018 11:24 PM

    george-178499 - Wednesday, April 25, 2018 11:06 PM

    Do you happen to have any xml unfriendly characters in the column [mk]. Such as &<>.

    These are likely to cause issues. So would need to do a encode and decode equivalent.

    I have done the replacement for the character '&'


    CREATE TABLE MyTable ([Address] varchar(100), [MD] VARCHAR(100), [MK] VARCHAR(100));
    INSERT INTO MyTable ([Address],[MD], [MK])
    VALUES
    ('https://www.abc123.com/p/1/red-widget','This is a red widget','Red Cup, Bright&Red Cup, Dark Red Cup'),
    ('https://www.abc123.com/p/1/small-red-widget','This is a small red widget','Bright Red Cup,Red Cup&, Small Dark Red Cup'),
    ('https://www.abc123.com/p/1/big-red-widget','This is a big red widget','Big Red Round Cup, Bright Red Cup, Small Dark Red Cup, Dark Red Cups');

    SELECT distinct
    replace(Split.a.value('.', 'VARCHAR(100)'),';amp','&') AS String
    FROM (SELECT [address],
     CAST ('<M>' + REPLACE(replace([mk],'&',';amp'), ',', '</M><M>') + '</M>' as XML) AS String
    FROM mytable) AS A CROSS APPLY String.nodes ('/M') AS Split(a);

    As for the extra space not sure how you are getting it. This is what i get when i try using sqlfiddle.
    http://sqlfiddle.com/#!18/7124e/109

    This worked perfectly. Thanks!

    If I want to add a where clause, to search like where [mk] like '%blue%'  and or order by asc, in you're query how would you do that?

    You can add the WHERE clauses after the mytable as follows and the ORDER BY after the Split(a)


    SELECT distinct
    replace(Split.a.value('.', 'VARCHAR(100)'),';amp','&') AS String
    FROM (SELECT [address],
         CAST ('<M>' + REPLACE(replace([mk],'&',';amp'), ',', '</M><M>') + '</M>' as XML) AS String
       FROM mytable
       WHERE md like '%red%'
      ) AS A CROSS APPLY String.nodes ('/M') AS Split(a)
    ORDER BY 1 asc

  • Thanks George & Steve for answering both question regarding where clause and extra column. Very very helpful -- appreciate it!

    i'm assuming if i have data such as, below, separated by a pipe |, just need to modify script and replace the  commas with the pipe?

    This is a red widget | Red Cup, Bright Red Cup | Dark Red Cup 

    like below:


     SELECT distinct
    replace(Split.a.value('.', 'VARCHAR(100)'),';amp','&') AS String, [md] as 'name of column'
    where [md] like '%some value%'
    FROM (SELECT [address],
    CAST ('<M>' + REPLACE(replace([mk],'&'|';amp'), '|', '</M><M>') + '</M>' as XML) AS String
    FROM mytable) AS A CROSS APPLY String.nodes ('/M') AS Split(a);

    where the end result is:

    This is a red widget
    Red Cup, Bright Red Cup
    Dark Red Cup 

  • Does having pipe cause problems in the xml creation? I didnt get any errors

    Check, i have an entry such as Bright||&|Red Cup
    http://sqlfiddle.com/#!18/7124e/135

    Anyway, you would do find and replace & with | to get you what you need.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply