Removing or Replacing Data

  • Hello Everyone

    I am working on some Health Care Data, I know, you don't have to rub it in.

    Some of the data that is being pumped in from HL7(who ever came up with this standard, needs to have their @$$ removed). Some of the data has what looks like XML tags all thru it

    example: <OBX.5><OBX.5.1>Blah Blah Data</OBX.5.1></OBX.5>

    And of course with the ending tags as well.

    I need to remove all of the tags. Anyone has an idea of how I can remove all the tags from within a column? So that the data comes out to be only the

    Blah Blah Data

    There may be many of the tags, with all different numbers inside.

    Thank You in advance

    Andrew SQLDBA

  • Have you tried the REPLACE() function?

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • The long and nasty way is to loop through the characters in the data.

    Use loops and find first and last '<', '>'.

    if < & > are not in the actual data it is fairly simple to remove first and last for each <obx.5> which works for <tag> and </mytag>

    This could take awhile depending upon how many columns. I would prefer to clean the chaff out of the target table and then put a stripping step in the data import.

    good luck

  • Yes, I have thought of that. But what I am not getting is all of the number within the tags.

    So my data looks like this if there are more than 2 digits within the tags originally.

    <.5.1>

    How can I ensure that everything within the tags, and including the tags get replaced?

    Andrew SQLDBA

  • i have this saved in my snippets for stripping out HTML tags. note this assumes that well formed html is in palce...so % lt ; should be there for < symbols that exist in the text data portion.

    declare @htmlText varchar(max)

    --===== Replace all HTML tags with nothing

    WHILE CHARINDEX('<',@HTMLText) > 0

    SELECT @HTMLText = STUFF(@HTMLText,

    CHARINDEX('<',@HTMLText),

    CHARINDEX('>',@HTMLText)-CHARINDEX('<',@HTMLText)+1,

    '')

    SELECT @HTMLText

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • While looping through the characters in the row.

    Find the first open bracket, do not include any data until you have went past the closing bracket.

    Note: if the input is consistent it can be even simpler.

    <Tag1><tag1.1>yada</tag1.1><Tag1> Consistent meaning the Yada is the only data you are looking for. Loop until you get to a character after '>' that is not '<' use that character string until you get to the next '<' done.

  • There are hundreds of tags throughout the column, and the tags are not in the same location in each row.

    Any advice?

    Thanks

    Andrew SQLDBA

  • If you had a couple of different sanitized examples it would give a better picture. thx

  • Why would a combination of REPLACE(), LIKE() and PATINDEX() not work here?

    EDIT: Like in Lowell's example? It's set based, no looping. Looping bad if you don't need to do it.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • I forgot to mention this, but the column is a text data type, and replace is not working with Text data type.

    Andrew SQLDBA

  • AndrewSQLDBA (7/20/2011)


    I forgot to mention this, but the column is a text data type, and replace is not working with Text data type.

    Can you update it to VARCHAR(MAX) without breaking anything?

    Or add a VARCHAR(MAX) column to the table, copy over the text data into the new column, then use the REPLACE()?

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Cast your text type as a varchar(max)

  • ok here's a tested working example...that i reworked quickly because you changed the datatype tyo TEXT.

    i'm assuming this'll be a one time update, so it might be slow on a MillionBillion rows of data:

    --DROP TABLE MySampleData

    CREATE TABLE MySampleData (ID INT,HTMLText TEXT)

    INSERT INTO MySampleData(ID,HTMLText)

    SELECT 1,'NoHTML Inside' UNION ALL

    SELECT 2, '<Tag1><tag1.1>yada</tag1.1><Tag1>' UNION ALL

    SELECT 3 ,'<OBX.5><OBX.5.1>Blah Blah Data</OBX.5.1></OBX.5>Other Stuff outside of the tags'

    DECLARE @TagsExist INT

    SET @TagsExist = 1

    WHILE @TagsExist > 0

    BEGIN

    IF NOT EXISTS(SELECT 1 FROM MySampleData WHERE CHARINDEX('<',CONVERT(VARCHAR(MAX),HTMLText)) > 0)

    BEGIN

    SET @TagsExist = 0

    END

    UPDATE MySampleData

    SET HTMLText = STUFF(CONVERT(VARCHAR(MAX),HTMLText),

    CHARINDEX('<',CONVERT(VARCHAR(MAX),HTMLText)),

    CHARINDEX('>',CONVERT(VARCHAR(MAX),HTMLText)) - CHARINDEX('<',CONVERT(VARCHAR(MAX),HTMLText))+ 1,

    '')

    WHERE CHARINDEX('<',CONVERT(VARCHAR(MAX),HTMLText)) > 0

    END --while loop

    SELECT * FROM MySampleData

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Nice snippet Lowell.

    That looks like a nice piece of code to place in the tool chest!

  • What happens if the text outside the tags containt < or > ?

    I never fully tackled this problem, but I know it's far from simple.

    Moreover I think that RBARing this might be a good option to try. Right now you're hitting and updating the same row a ton of times. I'd preffer using a itvf or even a simple function, loop in it and then update the base row only once (read / write once max).

Viewing 15 posts - 1 through 15 (of 24 total)

You must be logged in to reply to this topic. Login to reply