Tally Table vs. While Loop

  • jarid.lawson (5/31/2011)


    I need to go through this list and update all date fields into one format,...

    I strongly recommend against that. First, DATE datatype "fields" cannot be formatted. Second, if you format dates as a VARCHAR, then you've just made doing calculations with dates much more difficult and the front-end code also has to juggle things instead of allowing regional settings to handle the formatting for you.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Nadrek (6/1/2011)


    May I suggest as an optimization fewer but more complex UPDATE statements, to reduce the amount of I/O?

    As a very rough example, the first two statement could collapse to:

    Update dbo.Table1

    Set AdmitDate= CASE WHEN IsDate(AdmitDate)=1 THEN Convert(Varchar,Convert(DateTime,AdmitDate),112)

    ELSE ''

    END

    Then you can collapse the next set, and then, since there isn't a WHERE clause anymore, you can update more than one field in the same UPDATE and get a bunch of work done each pass through the table.

    Oh, be careful now. IsDate isn't what you think it is. For example, both 2011 and '2011' will return a "1" for IsDate. Are either a date or a mistake?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • I don't have a choice regarding the date data type being set to Varchar. This was set up by our application developers, and they require all fields be set to Varchar. Personally I would much rather set it to the correct data type, but I don't get a voice in that decision.

    As for the IsDate() statement, what would you recommend? I have that in the script to address the occasional '-None-' value or some other variation. Is there a better option to address this without needing to add in layers of code?

    [font="Arial"]“Any fool can know. The point is to understand.”
    - Albert Einstein

    "DOH!"
    - Homer Simpson[/font]

  • jarid.lawson (6/2/2011)


    I don't have a choice regarding the date data type being set to Varchar. This was set up by our application developers, and they require all fields be set to Varchar. Personally I would much rather set it to the correct data type, but I don't get a voice in that decision.

    As for the IsDate() statement, what would you recommend? I have that in the script to address the occasional '-None-' value or some other variation. Is there a better option to address this without needing to add in layers of code?

    Gosh... they just don't know what they've done. Oh, well... understood on your side. When I run into such a roadblock, I create a view for them to read from and have them write to the table itself so SQL Server can do it's stuff with a conversion and date check. I also have them insert a NULL for "None" and handle the display of that in the view. The advantages not only include the checking that SQL Server does, but it also allows be to easily change the date format or the display of such things as "None" if the need arises. I have to say it again, though, formatting things such as dates in T-SQL for display at a GUI is absolutely one of the worst practices there is for the reasons I previously stated and more.

    For the IsDate thing... IsDate() should certainly be a part of the check but, as you saw in my short examples, isn't enough alone. You need to check for, at a minimum, the presence of two slashes or two dashes or, if in certain places in non-US countries/territories, two periods. Again, they should be doing all of this checking on the GUI side.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • I think I understand your general idea here. If you don't mind I'd like to check what I'm getting from you. Instead of using...

    Where IsDate(AdmitDate)=1

    ...you recommend...

    Where IsDate(AdmitDate)=1

    And AdmitDate Like '%/%/%'

    Am I getting the idea or am I missing something. Sorry to ask such a mundane question, but I've been bit recently again and again by missing some little something because I didn't confirm.

    [font="Arial"]“Any fool can know. The point is to understand.”
    - Albert Einstein

    "DOH!"
    - Homer Simpson[/font]

  • Yes, at the very least. That's correct for when the "numeric" date format is unknown. If you have dates like '1 Jan 2011' or 'January 1st, 2001' (etc), then there are other things you'd need to add to the mix to allow those valid date representations.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • I'd change that "LIKE" statement to something a little more REGEX-ish.

    CREATE TABLE #T (

    DT CHAR(10))

    INSERT INTO #T (DT)

    VALUES ('1/1/11'),('01/01/2011'),('1.1.11'),('1-1-11'),('Bob');

    SELECT *

    FROM #T

    WHERE DT LIKE '%[/.[-]]%[/.[-]]%';

    The brackets in the Like statement allow for sets of characters. This one will check for slashes, periods, and dashes. The inner brackets around the dash character are escapes to indicate a literal dash. In Like statements, you can use the dash character to indicate ranges of acceptable values. For example, "LIKE '[0-9]'" will check for any single numeric character.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Wow, I started this for a completely different reason, and I've learned a ton just the same. To put it all together to make sure I've got some of this down:

    First, research SSIS to find the best means of completing ETL (I have 2 very basic SSIS processes set up since I started this post...thank you for that, it's cool if nothing else).

    Second, here is a code segment for updates to my date fields:

    /* -------------------- Update Date Fields -------------------- */

    Declare @LoopCount Int,

    @MaxCount Int,

    @FieldName Varchar(128),

    @SQL Varchar(4000)

    ----------

    Set @LoopCount=1

    Set @MaxCount=

    (Select Max(ID_Field)

    From dbo.TempFieldNameHolder)

    ----------

    While @LoopCount<=@MaxCount

    Begin

    Set @FieldName=''

    ----------

    Set @FieldName=

    (Select FieldName

    From dbo.TempFieldNameHolder

    Where ID_Field=@LoopCount

    And FieldType='Date')

    --------------------

    If @FieldName<>''

    Begin

    Set @SQL=

    'Update dbo.WorkingTable

    Set ' + @FieldName + '=Convert(Varchar,Convert(DateTime,' + @FieldName + '),112)

    Where IsDate(' + @FieldName + ')=1

    And ' + @FieldName + ' Like ''%[/.[-]]%[/.[-]]%''' /* Used to make sure only valid dates are attempted during update */

    Exec(@SQL)

    -----

    Set @SQL=

    'Update dbo.WorkingTable

    Set ' + @FieldName + '=''''

    Where IsDate(' + @FieldName + ')=0

    And ' + @FieldName + ' Like ''%[/.[-]%[/.[-]]%''

    And ' + @FieldName + ' Not Like ''%[JFMASOND]%''' /* Added to ignore formats like '1 Jan 2010', etc. Will add a validation check to catch these at a later date */

    Exec(@SQL)

    End

    ----------

    Set @LoopCount=@LoopCount + 1

    End

    Go

    To sum it all up I am still using the loop logic to step through the different fields (i.e. - AdmitDate, DischargeDate, RemittanceDate_PCL, RemittanceDate_PSR) instead of a tally table to apply the update statements. I have beefed up my formatting logic (since I still have to use Varchar(50) for all date fields).

    Am I missing anything that will amp this up any more?

    [font="Arial"]“Any fool can know. The point is to understand.”
    - Albert Einstein

    "DOH!"
    - Homer Simpson[/font]

  • I have a similar problem - though with not so many different headers! - which I solve using SSIS. The standard Flat File data task is very inflexible and cannot solve your problem. The answer is to write your own in VB.NET or C#.NET using a script component to source the data.

    What I do is first read in the header row and check each header against a list of known ones. Stop with error message if you don't recognise them all.

    If they're all good, split and load each row into your data flow. You'll need to translate each incoming field into the correct output field; I find a Hashtable useful for this, but you could probably do a lookup using a normal array if you wanted to.

    Do you know how to read a text file and split on a delimiter? Here's a code fragment for that:

    Dim File As String = Me.Variables.INfile

    Shortfile = IO.Path.GetFileName(File)

    RecordCount = 0

    Const ExpectedFields As String = _

    "Order_Number|Customer_Number|Order_Date|Order_Source|Order_Status|Net_Value|etc"

    ' Save the expected field names in a hash table for later lookup

    For Each S As String In Split(ExpectedFields, "|")

    Expected(S) = 1

    Next S

    Dim InStream As New IO.StreamReader(File) ' Open the input file

    ' Get the header line, check that all fields are as expected, and build a list of field codes

    Line = InStream.ReadLine

    Dim i As Long = 0

    Dim Headers As String() = Split(Line, vbTab)

    For Each S As String In Headers

    If Not Expected.ContainsKey(S) Then

    RaiseEvent Whatever ' String.Format("Unrecognised header '{0}' in file {1}, not processing it further", S, Shortfile)

    Exit Sub

    End If

    FieldNumber(S) = i

    i += 1

    Next S

    ' Then go on to work out which field is which - can't help much there because I don't know what your data looks like

  • GSquared (6/3/2011)


    For example, "LIKE '[0-9]'" will check for any single numeric character.

    As an option if you're going to work with a LIKE in the first place,

    LIKE [0-1][0-9][/-][0-3][0-9]

    and so on and so forth will let you do the first, primitive "It is even remotely possible for these numbers to be of the date format I expect" pass. Anything that passes that first check can be put into more strict checking.

  • Thanks for the code David. I used to work in VB 6, but I've been out of it for a number of years. I've got several books going right now: 3 on SQL, and 1 on VB.NET, so I'm slowly teaching myself the newer code techniques. Alas it is taking longer than I wished due to my job duties and the carnival going on in my personal life. In any case...

    My eventual goal is to have a VB.NET front end that will load the data in, split on delimiter, and use a drop down list of our standard field names. I plan on having the suggestion name listed already (similar to your code), and an other option to allow for manual entries (i.e. - "Some Random Field Name" becomes "SomeRandomFieldName"). From there I'm either going to load and run my standardization scripts via stored procedures, or run the parse logic via VB.NET before loading the final table into SQL Server.

    I created something like this at a previous job (in VB 6 of course), so I know the general layout already. The main issue I'm running into is the new VB.NET programming techniques. To be honest I never worked in OOP logic when I worked in VB 6, so I'm trying to learn it the right way this time around.

    I'll keep this noted to refer to later once I get to the point I can work in VB again in a production environment.

    [font="Arial"]“Any fool can know. The point is to understand.”
    - Albert Einstein

    "DOH!"
    - Homer Simpson[/font]

  • GSquared (6/3/2011)


    I'd change that "LIKE" statement to something a little more REGEX-ish.

    CREATE TABLE #T (

    DT CHAR(10))

    INSERT INTO #T (DT)

    VALUES ('1/1/11'),('01/01/2011'),('1.1.11'),('1-1-11'),('Bob');

    SELECT *

    FROM #T

    WHERE DT LIKE '%[/.[-]]%[/.[-]]%';

    Wouldn't that allow for things that may not be dates? Like 01.02/03?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden (6/4/2011)


    GSquared (6/3/2011)


    I'd change that "LIKE" statement to something a little more REGEX-ish.

    CREATE TABLE #T (

    DT CHAR(10))

    INSERT INTO #T (DT)

    VALUES ('1/1/11'),('01/01/2011'),('1.1.11'),('1-1-11'),('Bob');

    SELECT *

    FROM #T

    WHERE DT LIKE '%[/.[-]]%[/.[-]]%';

    Wouldn't that allow for things that may not be dates? Like 01.02/03?

    Yes, as will IsDate. Anything other than actually storing the data in the correct datatype in the first place will allow for a certain level of error. After all, 1/1/11 isn't necessarily a date either.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Wisecrack withdrawn.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

Viewing 14 posts - 16 through 28 (of 28 total)

You must be logged in to reply to this topic. Login to reply