Parsing out a string with multiple entries per record idea

Question

Post reply

Parsing out a string with multiple entries per record idea

Max Mutant

SSC Veteran

Points: 285
More actions
December 18, 2006 at 5:06 pm

#176058

All,

here's the problem.

I have a text/blob field that has multiple entries that are pseudo date/time stamped.

I have been trying to use this as an delimiter and in line with a split parser, but am not having much luck.

I need to break out each text entry per record id and migrate to a new table.

Any ideas on how to attack?

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

SDM SSCommitted Points: 1580 More actions · Answer 1

SDM

SSCommitted

Points: 1580

December 19, 2006 at 9:45 pm

#679041

If you could post a (smallish, representative) sample of your text data and roughly what you might want from it, you'd probably be inundated with tips. It's a bit difficult working blind.

Max Mutant SSC Veteran Points: 285 More actions · Answer 2

The datafield is similar to this:

14:57:00 12/01/06 (Pacific) Person making note Subject: blah, blah, blah Complaint: duh duh duh Object: dee dee deed 12:57:00 11/01/06 (Pacific) Person making note Subject: blah, blah, blah Complaint: duh duh duh Object: dee dee deed

Part of the problem is that the application is doing rtf formating and then storing that formating with the text in the field. The other rub is that the person doing the notation is not completely standardized. So the general theme is near the same, but not exact, so there's lots of extra characters, carriage returns, escape characters, etc.

As a result simple parsing using the time date stamp as aa delimiter produces varied results.

After looking at the data, its going to have to be scrubbed and then standardized before its parsed out.

The request for this was made without awareness of the state of the data and the expectation that it was a very simple operation.

However, if you have any examples of string parsing with options to parse out any non standard characters that would be awesome.

kkprasad SSC Enthusiast Points: 171 More actions · Answer 3

Here is some sample code that I could think of ... this is just to get you an idea. you need to put this code in WHILE loop inorder to process the complete string. I hope "(pacific)" is ths string that would appear after each date.

DECLARE

@val varchar(4000)

DECLARE

@pos1 int

DECLARE

@pos2 int

SET

@val = '14:57:00 12/01/06 (Pacific) Person making note Subject: blah, blah, blah Complaint: duh duh duh Object: dee dee deed 12:57:00 11/01/06 (Pacific) Person making note Subject: blah, blah, blah Complaint: duh duh duh Object: dee dee deed '

SET

@pos1 = CHARINDEX('(Pacific)', @val) + 19

SET

@pos2 = CHARINDEX('(Pacific)', @val, @pos1)

PRINT

@pos1

PRINT

@pos2

SELECT

substring(@val, @pos1, (@pos2 - @pos1))

in your WHILE loop @pos2 becomes @pos1 this way you can parse it easily.

Sergei Zarembo Hall of Fame Points: 3887 More actions · Answer 4

so u want to split the record:

14:57:00 12/01/06 (Pacific) Person making note Subject: blah, blah, blah Complaint: duh duh duh Object: dee dee deed 12:57:00 11/01/06 (Pacific) Person making note Subject: blah, blah, blah Complaint: duh duh duh Object: dee dee deed

into two records under the same ID:

14:57:00 12/01/06 (Pacific) Person making note Subject: blah, blah, blah Complaint: duh duh duh Object: dee dee deed

12:57:00 11/01/06 (Pacific) Person making note Subject: blah, blah, blah Complaint: duh duh duh Object: dee dee deed

is that the idea? the reason I'm asking is that u never clearly specified what to include in the splits.

i'm assuming u use SQL 2005.

Max Mutant SSC Veteran Points: 285 More actions · Answer 5

Max Mutant

SSC Veteran

Points: 285

December 20, 2006 at 3:25 pm

#679255

yes, thats the idea

kkprasad SSC Enthusiast Points: 171 More actions · Answer 6

kkprasad

SSC Enthusiast

Points: 171

December 20, 2006 at 3:34 pm

#679257

Yes, all that my code do is it splits the data based on keyword. I just gave an idea on to achieve the splitting of data.

I'm using SQL2005, but I presume the same code will work in SQL2000 since the string functions are almost similar in both versions.

Sergei Zarembo Hall of Fame Points: 3887 More actions · Answer 7

Sergei Zarembo

Hall of Fame

Points: 3887

December 20, 2006 at 4:52 pm

#679264

yes, but the problem is Max does not know what keyword will be there up front: his data could vary. What he needs is to split on a pattern

[date] [some_text]

, which justifies using regular expressions in this case.

Max did u ever use regular expressions in SQL 2005? That's the way to handle the task.

Sergei Zarembo Hall of Fame Points: 3887 More actions · Answer 8

Sergei Zarembo

Hall of Fame

Points: 3887

December 20, 2006 at 4:54 pm

#679265

kkprasad,

just tried your code, it returns

'king note Subject: blah, blah, blah Complaint: duh duh duh Object: dee dee deed 12:57:00 11/01/06'

is that what u expected, or i'm doing smth wrong here?

SDM SSCommitted Points: 1580 More actions · Answer 9

SDM

SSCommitted

Points: 1580

December 20, 2006 at 4:55 pm

#679266

You don't need to use regular expressions, PATINDEX would do the trick (and also allow for other time zones' dates in the data). Starting position can be determined by :

select @pos1 = patindex('%[0-9][0-9]:[0-9][0-9]:[0-9][0-9] [0-9][0-9]/[0-9][0-9]/[0-9][0-9] (%', @val)

SDM SSCommitted Points: 1580 More actions · Answer 10

The pity is, of course, that PATINDEX only takes 2 arguments (so no "starting from" displacement) so that to get both @pos1 and @pos2 you have to do something ugly like:

select @pos1 = patindex('%[0-9][0-9]:[0-9][0-9]:[0-9][0-9] [0-9][0-9]/[0-9][0-9]/[0-9][0-9] (%', @val)

,@pos2 = patindex('%[0-9][0-9]:[0-9][0-9]:[0-9][0-9] [0-9][0-9]/[0-9][0-9]/[0-9][0-9] (%', substring(@val, 2, len(@val)-1))

And i'm sure there are neater ways of doing this, but yuou get the idea.

Max Mutant SSC Veteran Points: 285 More actions · Answer 11

Max Mutant

SSC Veteran

Points: 285

December 20, 2006 at 5:02 pm

#679269

a hidden issue that complicates things, is that the application is using rtf so there might be hidden characters, formatting that impacts the delimiter

The data field needs to be scrubbed down and reformatted then parsed