return function

  • I have a table with a single column of varchar(8000) named raw_data. In each row of the data are a number of ~ (tildes) that are used as column delimiters. So to discover how many "columns" are in each data row I was using:

    select distinct

    len(raw_data) - len(replace(raw_data,'~',''))

    from dbo.SR

    order by

    len(raw_data) - len(replace(raw_data,'~',''))

    The data has 7,10 or 13 columns which should give me column counts of 6,9 and 12. I am getting:

    6

    9

    10

    12

    15

    22

    23

    24

    When I go in and look at rows with column counts other than 6,9 or 12 I am ONLY seeing 6,9 or 12 tildes. Any ideas on where the extra counts are coming from? This has me baffled..........................

    Thanks,

    Joe B

  • Well, the SQL statement is syntactically correct. I would still point a finger at the source data having those several different 'formats'. Two things to help troubleshoot, and one silly question:

    1. Add a COUNT() and GROUP BY to see how often each format occurs.

    2. Can you add an Identity column to your raw data table before inserting the data? This way you could get the exact 'row number' of data with an offending count of tildes using the LEN function in the WHERE clause.

    Q: Are you sure that there aren't any tildes in the data itself?

    Regards,

    --SJT--

  • Can you post a sample of the row data, especially the data where it's giving you the wrong count? (Best, if possible, would be a copy-and-paste from the actual data. If not possible, see if you can create a test set that gives the wrong answer.)

    For some reason, I'm thinking trailing spaces, but that might just be because I'm not awake yet.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Ack. Trailing spaces. Right.

    --SJT--

  • SJT,

    Thanks for the response. I have added the group by and count. My real file is close to 3 million records so I made a smaller file of about 6800 records. The 23 count has 45 records which I've looked at and they all contain 12 tildes. At this point I'm wondering if there is some kind of COALLATION issue because the source for the files is either a mainframe or some other archaic machine. So I may be dealing with some EDI/EBCDIC translator issue putting (or not removing) some garbage characters from the files. Looks like I got a couple of other responses so I'll be checking them.

    Again, thanks,

    Joe B

  • Hey all,

    Also have wrapped raw_data with TRIM so there aren't any space issues. I'm still leaning towards the COALLATION issue because I just got off the phone with one of the client's people and he thinks the data IS coming thru a couple of translators/processes before I see it.

    Thanks,

    Joe B

  • JBANKO (8/14/2008)


    Also have wrapped raw_data with TRIM so there aren't any space issues. I'm still leaning towards the COALLATION issue because I just got off the phone with one of the client's people and he thinks the data IS coming thru a couple of translators/processes before I see it.

    Actually, TRIM won't fix all of your problems with spaces. Since you are using the LEN() function for both the before and after calculations, the original trailing spaces should be irrelevant. The problem may come from the trailing spaces that are left behind after] you remove the tildes. Example:

    Before: '123~456~ ~ ' LEN=12 (Datalength=15)

    After: '123456 ' LEN=6 (Datalength=12)

    Note that an RTRIM will only change the Datalength, not the LEN. What I think that you should try is to use the DATALENGTH() function instead of LEN() and skip the TRIMs.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • BINGO! I actually wasn't removing any spaces from the data, just creating a predicate to stick the data into 3 tables. But you're correct about LEN vs DATALENGTH. Switching to DATALENGTH gave me the 3 counts that I needed; 6,9,12.

    Thanks for the help!!!!

    Joe B

  • Glad I could help.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply