Updating text within a single column, but not the entire string, duplicates string being updated

  • rabisco (5/15/2012)


    Sean Lange (5/15/2012)


    The more I think about this the more ugly it becomes. Can this be done in managed code instead in the database? t-sql is just not that efficient at this type of string manipulation. Maybe even a CLR proc? It would still be kind of ugly but I think better than trying to do this in straight sql.

    Even better would be to have the people entering this to edit their data so it is correct. 😛

    Thanks for responding.

    Unfortunately, this is a project to attempt to correct Public user data. The requirement is to correct the incomplete data via a nightly database job.

    Still wearing my positive hat :-), while the PM is breathing down my neck 🙁 , it does look like we are close though, I just need to prevent the double updates. Thanks for your help.

    Can you do this with CLR? That would be much easier to code and the performance would not be nearly as awful.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean Lange (5/15/2012)


    rabisco (5/15/2012)


    Sean Lange (5/15/2012)


    The more I think about this the more ugly it becomes. Can this be done in managed code instead in the database? t-sql is just not that efficient at this type of string manipulation. Maybe even a CLR proc? It would still be kind of ugly but I think better than trying to do this in straight sql.

    Even better would be to have the people entering this to edit their data so it is correct. 😛

    Thanks for responding.

    Unfortunately, this is a project to attempt to correct Public user data. The requirement is to correct the incomplete data via a nightly database job.

    Still wearing my positive hat :-), while the PM is breathing down my neck 🙁 , it does look like we are close though, I just need to prevent the double updates. Thanks for your help.

    Can you do this with CLR? That would be much easier to code and the performance would not be nearly as awful.

    Hmm, I'm not familiar with CLR. Any tips?

  • CLR is a way of executing a stored proc whose code is actually .NET code. This type of string manipulation is tricky and probably a bit easier to code in .NET than t-sql.

    Here is an article that explains how to create your first CLR. http://qa.sqlservercentral.com/articles/SQL+Server+2008/87743/[/url]

    If you are not familiar with .NET this may be a somewhat steep learning curve.

    I am kinda behind on a project at work and I am out Thursday and Friday this week. If I can find some time I will see if I can spend a few cycles on this. If nobody else responds, make sure you ping me again early next week and I will see what I can do to help.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean Lange (5/15/2012)


    CLR is a way of executing a stored proc whose code is actually .NET code. This type of string manipulation is tricky and probably a bit easier to code in .NET than t-sql.

    Here is an article that explains how to create your first CLR. http://qa.sqlservercentral.com/articles/SQL+Server+2008/87743/[/url]

    If you are not familiar with .NET this may be a somewhat steep learning curve.

    I am kinda behind on a project at work and I am out Thursday and Friday this week. If I can find some time I will see if I can spend a few cycles on this. If nobody else responds, make sure you ping me again early next week and I will see what I can do to help.

    Thanks for the tip/advice. I'll be sure to ping you next week; really appreciate it.

  • rabisco (5/15/2012)


    Sean Lange (5/15/2012)


    CLR is a way of executing a stored proc whose code is actually .NET code. This type of string manipulation is tricky and probably a bit easier to code in .NET than t-sql.

    Here is an article that explains how to create your first CLR. http://qa.sqlservercentral.com/articles/SQL+Server+2008/87743/[/url]

    If you are not familiar with .NET this may be a somewhat steep learning curve.

    I am kinda behind on a project at work and I am out Thursday and Friday this week. If I can find some time I will see if I can spend a few cycles on this. If nobody else responds, make sure you ping me again early next week and I will see what I can do to help.

    Thanks for the tip/advice. I'll be sure to ping you next week; really appreciate it.

    No problem. Wish I had a bit more time before then. Sounds like the pressure from the PM is going to start coming soon. Hopefully we can figure out a solution before then. 😉

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • This is an interesting "problem".

    rabisco,

    i had an hour of free time, so I MAY have a CLR solution for you.

    what is the logic for the code1-to-code2 mapping?

    thanks.

    .j.

  • jim.l (5/16/2012)


    This is an interesting "problem".

    rabisco,

    i had an hour of free time, so I MAY have a CLR solution for you.

    what is the logic for the code1-to-code2 mapping?

    thanks.

    .j.

    Hi Jim.l, just seen your reply. Not sure if I have missed your one hour of free time

    code 1 and code 2 have a one to one mapping. The user should enter code1 and code 2 for a product. There could be one or more paragraphs in any row, who also mean that there could be one or more occurrences of code1 or code2 or tha code1+code2 pair.

    Essentially, anywhere within a paragraph where code is is found without the corresponding code2, look up the missing code 2 and insert it after the code1 and vice versa. In any paragraph where the pair is found, do not process them.

    Let me know if you need more clarification.

    Thanks.

  • So the lookup is based on the 1 existing code?

    e.g. " SELECT code2_sku FROM Linktable WHERE code1_sku = '123456' "

    Are there any duplicates (code1 or code2) in Linktable which could cause problems?

  • jim.l (5/16/2012)


    So the lookup is based on the 1 existing code?

    e.g. " SELECT code2_sku FROM Linktable WHERE code1_sku = '123456' "

    Are there any duplicates (code1 or code2) in Linktable which could cause problems?

    No there are no duplicates in the linktable.

  • ok, so with your clarification on code lookup, i made some changes. below you'll find some code for the CLR project which should work. It's not the prettiest code in the world, but in my testing it seems to do the job.

    Here's the code. Use the link Sean provided on how to generate and deploy the CLR into sql.

    ===== BEGIN .net C# Code =====

    using System;

    using System.Data;

    using System.Data.SqlClient;

    using System.Data.SqlTypes;

    using System.Xml;

    using Microsoft.SqlServer.Server;

    public partial class UserDefinedFunctions

    {

    [SqlFunction(DataAccess = DataAccessKind.Read, SystemDataAccess = SystemDataAccessKind.Read)]

    public static SqlString fnCleanBadHtml(SqlChars htmlToClean, string mappingTableName)

    {

    if (htmlToClean == null || string.IsNullOrEmpty(htmlToClean.ToSqlString().ToString()) || string.IsNullOrEmpty(mappingTableName))

    return string.Empty;

    DataSet dsCodeMapping = null;

    // populate code mapping lookup data (mappingTableName);

    if (dsCodeMapping == null)

    {

    dsCodeMapping = new DataSet();

    using (SqlConnection cxn = new SqlConnection("context connection=true"))

    {

    SqlCommand cmd = null;

    SqlDataAdapter sda = null;

    try

    {

    cmd = new SqlCommand();

    cmd.Connection = cxn;

    cmd.CommandText = "Select * from " + mappingTableName;

    cxn.Open();

    sda = new SqlDataAdapter(cmd);

    sda.Fill(dsCodeMapping);

    }

    finally

    {

    if (sda != null)

    sda.Dispose();

    if (cmd != null)

    cmd.Dispose();

    }

    }

    }

    // BEGIN fix code values

    // load html into xmldoc

    XmlDocument xDoc = new XmlDocument();

    xDoc.LoadXml(htmlToClean.ToSqlString().ToString());

    const string xPath = "//body/a";

    // get all <a> nodes

    XmlNodeList badNodes = xDoc.SelectNodes(xPath);

    if (badNodes != null && badNodes.Count > 0)

    {

    // found nodes, iterate

    foreach (XmlNode eaBadNode in badNodes)

    {

    XmlAttributeCollection attributes = eaBadNode.Attributes;

    // get existing code1 value from html, if exists

    XmlAttribute attribCode1 = (attributes["code1"] != null)

    ? attributes["code1"]

    : xDoc.CreateAttribute("code1");

    // get existing code2 value from html, if exists

    XmlAttribute attribCode2 = (attributes["code2"] != null)

    ? attributes["code2"]

    : xDoc.CreateAttribute("code1");

    string code1 = attribCode1.Value;

    string code2 = attribCode2.Value;

    // either code1 or code2 is empty; need to process

    if (string.IsNullOrEmpty(code1) || string.IsNullOrEmpty(code2))

    {

    string missingCodeName = string.Empty;

    string missingCodeValue = string.Empty;

    string missingCodeColumn = string.Empty;

    string lookupCodeValue = "";

    string lookupCodeColumn = "";

    // determine which code is missing

    if (string.IsNullOrEmpty(code1)) // code1 is missing

    {

    lookupCodeColumn = "code2_sku";

    lookupCodeValue = code2;

    missingCodeName = "code1";

    missingCodeColumn = "code1_sku";

    }

    else if (string.IsNullOrEmpty(code2)) // code2 is missing

    {

    lookupCodeColumn = "code1_sku";

    lookupCodeValue = code1;

    missingCodeName = "code2";

    missingCodeColumn = "code2_sku";

    }

    // set the filter condition

    string selectFilter = lookupCodeColumn + " = '" + lookupCodeValue.ToString() + "'";

    // filter the lookup table to find the correct record

    DataRow[] dRows = dsCodeMapping.Tables[0].Select(selectFilter);

    // get the corresponding code from lookup table, if exists

    if (dRows != null && dRows.Length > 0)

    missingCodeValue = (dRows[0][missingCodeColumn] != null) ? dRows[0][missingCodeColumn].ToString() : string.Empty;

    else

    missingCodeValue = "";

    // if new code value is found, update the XmlNode

    if (!string.IsNullOrEmpty(missingCodeValue))

    {

    XmlAttribute newCodeAttrib = xDoc.CreateAttribute(missingCodeName);

    newCodeAttrib.Value = missingCodeValue;

    eaBadNode.Attributes.Append(newCodeAttrib);

    }

    }

    }

    }

    // END fix code values

    return xDoc.OuterXml;

    }

    };

    ===== END .net C# Code =====

    once the CLR and UDF are deployed, test it by executing:

    select dbo.fnCleanBadHtml('<body><a name="productLink" class="productLink" code2="83K3074" href="">sku2</a></body>','dbo.Linktable')

    original html:

    <body><a name="productLink" class="productLink" code2="83K3074" href="">sku2</a></body>

    cleaned html:

    <body><a name="productLink" class="productLink" code2="83K3074" href="" code1="1595474">sku2</a></body>

    I'm out for the day, but can help with any questions you have tomorrow.

  • jim.l (5/16/2012)


    ok, so with your clarification on code lookup, i made some changes. below you'll find some code for the CLR project which should work. It's not the prettiest code in the world, but in my testing it seems to do the job. ...

    You can make it prettier just by adding the code tags. 😀

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • jim.l (5/16/2012)


    ok, so with your clarification on code lookup, i made some changes. below you'll find some code for the CLR project which should work. It's not the prettiest code in the world, but in my testing it seems to do the job.

    Here's the code. Use the link Sean provided on how to generate and deploy the CLR into sql.

    ===== BEGIN .net C# Code =====

    using System;

    using System.Data;

    using System.Data.SqlClient;

    using System.Data.SqlTypes;

    using System.Xml;

    using Microsoft.SqlServer.Server;

    public partial class UserDefinedFunctions

    {

    [SqlFunction(DataAccess = DataAccessKind.Read, SystemDataAccess = SystemDataAccessKind.Read)]

    public static SqlString fnCleanBadHtml(SqlChars htmlToClean, string mappingTableName)

    {

    if (htmlToClean == null || string.IsNullOrEmpty(htmlToClean.ToSqlString().ToString()) || string.IsNullOrEmpty(mappingTableName))

    return string.Empty;

    DataSet dsCodeMapping = null;

    // populate code mapping lookup data (mappingTableName);

    if (dsCodeMapping == null)

    {

    dsCodeMapping = new DataSet();

    using (SqlConnection cxn = new SqlConnection("context connection=true"))

    {

    SqlCommand cmd = null;

    SqlDataAdapter sda = null;

    try

    {

    cmd = new SqlCommand();

    cmd.Connection = cxn;

    cmd.CommandText = "Select * from " + mappingTableName;

    cxn.Open();

    sda = new SqlDataAdapter(cmd);

    sda.Fill(dsCodeMapping);

    }

    finally

    {

    if (sda != null)

    sda.Dispose();

    if (cmd != null)

    cmd.Dispose();

    }

    }

    }

    // BEGIN fix code values

    // load html into xmldoc

    XmlDocument xDoc = new XmlDocument();

    xDoc.LoadXml(htmlToClean.ToSqlString().ToString());

    const string xPath = "//body/a";

    // get all <a> nodes

    XmlNodeList badNodes = xDoc.SelectNodes(xPath);

    if (badNodes != null && badNodes.Count > 0)

    {

    // found nodes, iterate

    foreach (XmlNode eaBadNode in badNodes)

    {

    XmlAttributeCollection attributes = eaBadNode.Attributes;

    // get existing code1 value from html, if exists

    XmlAttribute attribCode1 = (attributes["code1"] != null)

    ? attributes["code1"]

    : xDoc.CreateAttribute("code1");

    // get existing code2 value from html, if exists

    XmlAttribute attribCode2 = (attributes["code2"] != null)

    ? attributes["code2"]

    : xDoc.CreateAttribute("code1");

    string code1 = attribCode1.Value;

    string code2 = attribCode2.Value;

    // either code1 or code2 is empty; need to process

    if (string.IsNullOrEmpty(code1) || string.IsNullOrEmpty(code2))

    {

    string missingCodeName = string.Empty;

    string missingCodeValue = string.Empty;

    string missingCodeColumn = string.Empty;

    string lookupCodeValue = "";

    string lookupCodeColumn = "";

    // determine which code is missing

    if (string.IsNullOrEmpty(code1)) // code1 is missing

    {

    lookupCodeColumn = "code2_sku";

    lookupCodeValue = code2;

    missingCodeName = "code1";

    missingCodeColumn = "code1_sku";

    }

    else if (string.IsNullOrEmpty(code2)) // code2 is missing

    {

    lookupCodeColumn = "code1_sku";

    lookupCodeValue = code1;

    missingCodeName = "code2";

    missingCodeColumn = "code2_sku";

    }

    // set the filter condition

    string selectFilter = lookupCodeColumn + " = '" + lookupCodeValue.ToString() + "'";

    // filter the lookup table to find the correct record

    DataRow[] dRows = dsCodeMapping.Tables[0].Select(selectFilter);

    // get the corresponding code from lookup table, if exists

    if (dRows != null && dRows.Length > 0)

    missingCodeValue = (dRows[0][missingCodeColumn] != null) ? dRows[0][missingCodeColumn].ToString() : string.Empty;

    else

    missingCodeValue = "";

    // if new code value is found, update the XmlNode

    if (!string.IsNullOrEmpty(missingCodeValue))

    {

    XmlAttribute newCodeAttrib = xDoc.CreateAttribute(missingCodeName);

    newCodeAttrib.Value = missingCodeValue;

    eaBadNode.Attributes.Append(newCodeAttrib);

    }

    }

    }

    }

    // END fix code values

    return xDoc.OuterXml;

    }

    };

    ===== END .net C# Code =====

    once the CLR and UDF are deployed, test it by executing:

    select dbo.fnCleanBadHtml('<body><a name="productLink" class="productLink" code2="83K3074" href="">sku2</a></body>','dbo.Linktable')

    original html:

    <body><a name="productLink" class="productLink" code2="83K3074" href="">sku2</a></body>

    cleaned html:

    <body><a name="productLink" class="productLink" code2="83K3074" href="" code1="1595474">sku2</a></body>

    I'm out for the day, but can help with any questions you have tomorrow.

    Hi Jim.l,

    Thanks for this. I'll spend some time getting my head round your code this morning and how it would work in the our environment.

    It's essentially a java based enterprise social business with a sql sever back end. All data is persisted to DB tables.

    I'll be reading in the body of the documents processing it to make the updates.

    I'll ready up on CLR and ask some questions shortly.

    Thanks.

    Thanks

  • Hi,

    I made this using a split with jeff modden style . It migth be useful if you are restricted to not use CLR.

    DECLARE @Sql AS VARCHAR(MAX)

    SET @Sql = ''

    ;WITH BodySplit AS

    (

    SELECT BodyId,

    ItemNumber = ROW_NUMBER() OVER(ORDER BY n.Number),

    ItemValue = SUBSTRING(d.BodyText,

    n.Number,

    isnull(nullif( patindex('%<a%',

    substring(d.BodyText,n.Number+ 2,datalength(d.BodyText)- n.Number - 2 +1)

    )- 1

    ,-1

    ),

    datalength(d.BodyText)- n.Number +1

    )

    ),

    d.BodyText

    FROM DocumentBody d

    INNER JOIN Numbers n

    ON n.Number BETWEEN 0 AND DATALENGTH(d.BodyText) AND

    SUBSTRING(d.BodyText,n.Number,2)='<a' OR n.Number = 1

    WHERE BodyTExt is not null

    )

    ,ItemCodes

    AS

    (

    SELECT BodyId,ItemNumber,ItemValue,

    Code1 = CASE WHEN PATINDEX('%code1%',ItemValue) > 0

    THEN SUBSTRING(ItemValue,PATINDEX('%Code1="[0-9][0-9]_[0-9][0-9][0-9][0-9]%',ItemValue)+5+2,7)

    ELSE ''

    END,

    Code2 = CASE WHEN PATINDEX('%code2%',ItemValue) > 0

    THEN SUBSTRING(ItemValue,PATINDEX('%Code2="[0-9][0-9]_[0-9][0-9][0-9][0-9]%',ItemValue)+5+2,7)

    ELSE ''

    END

    FROM BodySplit bs

    ),

    NewItems

    AS

    (

    select BodyId,ItemNumber,ItemValue ,

    NewItem = CASE WHEN Code1 = '' AND lc.Code1_sku IS NOT NULL

    THEN replace(ItemValue,'Code2="'+ic.Code2+'"','Code1="'+lC.Code1_sku+'" Code2="'+ic.Code2+'"')

    WHEN Code2 = '' AND lc.Code2_sku IS NOT NULL

    THEN replace(ItemValue,'Code1="'+ic.Code1+'"','Code1="'+ic.Code1+'" Code2="'+lc.Code2_sku+'"')

    ELSE ItemValue

    END

    from itemcodes ic

    LEFT JOIN LinkTable lc

    ON (ic.Code1 ='' AND ltrim(rtrim(ic.Code2)) = lc.Code2_Sku) OR

    (ic.Code2 ='' AND ltrim(rtrim(ic.Code1))= lc.Code1_Sku)

    )

    SELECT @Sql = @Sql + ' UPDATE DocumentBody SET BodyText = REPLACE(BodyText,'''+ItemValue+''','''+NewItem+''' ) WHERE BodyId = '+ convert(VARCHAR(20),BODYID)+ char(13)+char(10)

    FROM NewItems WHERE ItemValue <> NewItem

    ORDER BY BodyId,ItemNumber

    PRINT @Sql

    EXEC (@Sql)

  • adrian.facio (5/18/2012)


    Hi,

    I made this using a split with jeff modden style . It migth be useful if you are restricted to not use CLR.

    DECLARE @Sql AS VARCHAR(MAX)

    SET @Sql = ''

    ;WITH BodySplit AS

    (

    SELECT BodyId,

    ItemNumber = ROW_NUMBER() OVER(ORDER BY n.Number),

    ItemValue = SUBSTRING(d.BodyText,

    n.Number,

    isnull(nullif( patindex('%<a%',

    substring(d.BodyText,n.Number+ 2,datalength(d.BodyText)- n.Number - 2 +1)

    )- 1

    ,-1

    ),

    datalength(d.BodyText)- n.Number +1

    )

    ),

    d.BodyText

    FROM DocumentBody d

    INNER JOIN Numbers n

    ON n.Number BETWEEN 0 AND DATALENGTH(d.BodyText) AND

    SUBSTRING(d.BodyText,n.Number,2)='<a' OR n.Number = 1

    WHERE BodyTExt is not null

    )

    ,ItemCodes

    AS

    (

    SELECT BodyId,ItemNumber,ItemValue,

    Code1 = CASE WHEN PATINDEX('%code1%',ItemValue) > 0

    THEN SUBSTRING(ItemValue,PATINDEX('%Code1="[0-9][0-9]_[0-9][0-9][0-9][0-9]%',ItemValue)+5+2,7)

    ELSE ''

    END,

    Code2 = CASE WHEN PATINDEX('%code2%',ItemValue) > 0

    THEN SUBSTRING(ItemValue,PATINDEX('%Code2="[0-9][0-9]_[0-9][0-9][0-9][0-9]%',ItemValue)+5+2,7)

    ELSE ''

    END

    FROM BodySplit bs

    ),

    NewItems

    AS

    (

    select BodyId,ItemNumber,ItemValue ,

    NewItem = CASE WHEN Code1 = '' AND lc.Code1_sku IS NOT NULL

    THEN replace(ItemValue,'Code2="'+ic.Code2+'"','Code1="'+lC.Code1_sku+'" Code2="'+ic.Code2+'"')

    WHEN Code2 = '' AND lc.Code2_sku IS NOT NULL

    THEN replace(ItemValue,'Code1="'+ic.Code1+'"','Code1="'+ic.Code1+'" Code2="'+lc.Code2_sku+'"')

    ELSE ItemValue

    END

    from itemcodes ic

    LEFT JOIN LinkTable lc

    ON (ic.Code1 ='' AND ltrim(rtrim(ic.Code2)) = lc.Code2_Sku) OR

    (ic.Code2 ='' AND ltrim(rtrim(ic.Code1))= lc.Code1_Sku)

    )

    SELECT @Sql = @Sql + ' UPDATE DocumentBody SET BodyText = REPLACE(BodyText,'''+ItemValue+''','''+NewItem+''' ) WHERE BodyId = '+ convert(VARCHAR(20),BODYID)+ char(13)+char(10)

    FROM NewItems WHERE ItemValue <> NewItem

    ORDER BY BodyId,ItemNumber

    PRINT @Sql

    EXEC (@Sql)

    Thanks SSC-Enthusiastic.

  • Hey buddy,

    What sql server version are you using? do you have xml variables?

Viewing 15 posts - 16 through 30 (of 61 total)

You must be logged in to reply to this topic. Login to reply