Empty Strings From ActiveX Transform

  • I've just now realized a problem, or maybe more like an annoyance, in my Data Warehouse database. The database has about 30 tables, something like 800 fields and sits on about 17G of disk. I have two large and complex DTS packages built to refresh it every night.

    The problem is that in many of my text fields (char & varchar), I have LOTS of empty strings. It's easy to see where the problem originates: Because of the fact that the source text files used to populate the db come from a fixed-width world, if a field is empty in the source db, it's all spaces in the text file. In my DTS transform, I use the Trim function in an ActiveX script to remove leading/trailing spaces from the data and, of course, if the field is ALL spaces its result is an empty string and that is what is written to the db.

    Here are the concerns I have:

    1. A combination of NULLs and empty strings can cause unexpected query results, unless all concerned know to do "WHERE <fieldname> = "" OR <fieldname> IS NULL". This at very least is a pain in the neck.

    2. Empty strings take more space than NULL.

    3. Empty strings take more time to write to a table than NULL.

    4. Empty strings in an indexed field will make the index less efficient.

    Are all of these concerns valid? Have I left out anything? Considering my scenario, would you be in a rush to convert all empty strings in the db to NULL as a priority? I do have plenty of free disk currently but the number of remaining gigs IS finite. The window with which I have to do the refresh at night does still have some room but again, the more buffer the better.

    I'm thinking what I need to do first is a one-time clean up operation with massive use of the NULLIF function in T-SQL to convert all the current empty strings to NULL. (I'd love to see an existing SP that would spin through all fields in all tables to do that!) The long term solution is to make sure that, in my DTS packages, when I encounter a source value that is all spaces I end up writing a NULL to the destination field. Seeing as how I'm working with VBScript in the transform, I'm not sure how to do this without writing a custom function and slowing the process down quite a bit.

    Any "refresh-a-warehouse-from-text" scenarios out there that have dealt with this problem before? If so, I'd value any suggestions.  Thanks!

  • Firstly, to tidy up your existing data see the following article:

    http://qa.sqlservercentral.com/columnists/lPeysakhovich/exoticsqlnullcolumnanalysis.asp

    Loops through all columns in a db looking for null values. You can customise this to search and replace empty columns.

    In terms of going forward, I would recommend doing the transform in the activex script. It shouldn't slow things down too much as far as I can see. Test out one days data and see the difference between the two methods.

    The only alternative would be to rerun the the scrupt you will be using to tidy up the existing data targeting only the new records if you can identify them. It is impossible to say what is best. Testing the two will tell you one way or the other.

    An alternate and cleaner way is to take the data into a temporary table, clean it there and then bring it in your live db. Will take longer but more controlled.


    ------------------------------
    The Users are always right - when I'm not wrong!

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply