Validate field in STaging before loading

  • I have a STaging table where I have defined all fields as Varchar(255), now I'm trying to load the Prod table, and one of my fields(order_open_qty) is defined as Decimal(18,10), but it fails as data from STaging fails converting Varchar to Dec. Is there any

    pre-validation I can run to change those bad records to a predefined value to be able to complete load.

    Thanks.

  • Check out the ISNUMERIC() function. That will tell you what you want to know.

    Strong advice: all tables, all databases, all the time, use appropriate data types. It really matters.

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

  • Run a query to see what is in those rows. Perhaps it is empty spaces that you can update to 0, or 'null'.

    You could also use the case statement when selecting the rows for the insert

  • Grant Fritchey wrote:

    Check out the ISNUMERIC() function. That will tell you what you want to know.

    No... ISNUMERIC() can help but it must not be considered as any type of "Is All Digits" function because it doesn't work for that.  Please see the following article on that and what you can use in its place.

    https://qa.sqlservercentral.com/articles/why-doesn%e2%80%99t-isnumeric-work-correctly-sql-spackle

    Grant Fritchey wrote:

    Strong advice: all tables, all databases, all the time, use appropriate data types. It really matters.

    I totally agree.  DECIMAL(18,10) means 8 digits to the left of the decimal point and 10 to the right.  That seems like a very strange datatype for an "order quantity' column.  It would be really interesting to know what the requirement was for that very odd design for such a column.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Grant Fritchey wrote:

    Strong advice: all tables, all databases, all the time, use appropriate data types. It really matters.

    Obviously I agree with that principle, as a general rule. Would you say I am way off beam when using temp tables for actually doing the tidy up of crap data?

    A few times I have had to do a hurried, short-notice import of data from various text files, my approach is usually to make temp tables using overly wide NVARCHAR() for all columns, then I can use SQL to marshall it into some semblance of order before inserting/updating the properly typed "real" tables.

    I've hit so many stumbling blocks (or lack of knowledge in other tools) trying to sanitise the data first, I default to my comfort blanket of just sanitising it using SQL!

    "Knowledge is of two kinds. We know a subject ourselves, or we know where we can find information upon it. When we enquire into any subject, the first thing we have to do is to know what books have treated of it. This leads us to look at catalogues, and at the backs of books in libraries."
    — Samuel Johnson
    I wonder, would the great Samuel Johnson have replaced that with "GIYF" now?

  • >> Obviously I agree with that principle, as a general rule. Would you say I am way off beam when using temp tables for actually doing the tidy up of crap data? <<

    We never meant SQL to be used for this purpose . When we were approving the standards, we intended an SQL database to be receiving clean data. Remember the idea of a tiered architecture?

    >> A few times I have had to do a hurried, short-notice import of data from various text files, my approach is usually to make temp tables using overly wide NVARCHAR() for all columns, then I can use SQL to marshall it into some semblance of order before inserting/updating the properly typed "real" tables. <<

    Damn that real world! ;(

    Please post DDL and follow ANSI/ISO standards when asking for help. 

  • david.edwards 76768 wrote:

    Grant Fritchey wrote:

    Strong advice: all tables, all databases, all the time, use appropriate data types. It really matters.

    Obviously I agree with that principle, as a general rule. Would you say I am way off beam when using temp tables for actually doing the tidy up of crap data?

    A few times I have had to do a hurried, short-notice import of data from various text files, my approach is usually to make temp tables using overly wide NVARCHAR() for all columns, then I can use SQL to marshall it into some semblance of order before inserting/updating the properly typed "real" tables.

    I've hit so many stumbling blocks (or lack of knowledge in other tools) trying to sanitise the data first, I default to my comfort blanket of just sanitising it using SQL!

    No.  You're not off beam on that at all.  When I do imports, it's ALWAYS to a staging table of sorts for exactly the reason you state.  I'll either use a Temp Table or a semi-permanent working table to do a validate, clean, and glean process.

    I do sometimes make the staging table look exactly like the target table in production and setup BULK INSERT or BCP or whatever to report on, log, and save any bad rows for rework or to send back to the provider as feedback.  A lot of times, they don't know they have a problem.  If they do know and won't do anything about it, then I send them one email per bad row and make sure that it has a wide distribution in the company, if you know what I mean. 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • david.edwards 76768 wrote:

    Grant Fritchey wrote:

    Strong advice: all tables, all databases, all the time, use appropriate data types. It really matters.

    Obviously I agree with that principle, as a general rule. Would you say I am way off beam when using temp tables for actually doing the tidy up of crap data?

    A few times I have had to do a hurried, short-notice import of data from various text files, my approach is usually to make temp tables using overly wide NVARCHAR() for all columns, then I can use SQL to marshall it into some semblance of order before inserting/updating the properly typed "real" tables.

    I've hit so many stumbling blocks (or lack of knowledge in other tools) trying to sanitise the data first, I default to my comfort blanket of just sanitising it using SQL!

    There are any number of compromises we all make. It's non-stop. The goal is to make as few as possible.

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

  • "DECIMAL(18,10) means 8 digits to the left of the decimal point and 10 to the right.  That seems like a very strange datatype for an "order quantity' column.  It would be really interesting to know what the requirement was for that very odd design for such a column."

    This datatype was inherited, and not sure the reason behind it.  Trying to find a solution to keep the process working, before digging

    into a permanent solution.

    Thanks..

     

     

  • Bruin wrote:

    "DECIMAL(18,10) means 8 digits to the left of the decimal point and 10 to the right.  That seems like a very strange datatype for an "order quantity' column.  It would be really interesting to know what the requirement was for that very odd design for such a column."

    This datatype was inherited, and not sure the reason behind it.  Trying to find a solution to keep the process working, before digging

    into a permanent solution.

    Thanks..

    Got to love inheriting stuff (or having black boxes installed from vendors).

    We use a very well known document management system, the Document Number, used throughout the database, is a FLOAT datatype. I've never understood why when the actual data is an INT (BIGINT I guess when you get a lot of documents).

    "Knowledge is of two kinds. We know a subject ourselves, or we know where we can find information upon it. When we enquire into any subject, the first thing we have to do is to know what books have treated of it. This leads us to look at catalogues, and at the backs of books in libraries."
    — Samuel Johnson
    I wonder, would the great Samuel Johnson have replaced that with "GIYF" now?

  • Jeff Moden wrote:

    No.  You're not off beam on that at all.  When I do imports, it's ALWAYS to a staging table of sorts for exactly the reason you state.  I'll either use a Temp Table or a semi-permanent working table to do a validate, clean, and glean process.

    I do sometimes make the staging table look exactly like the target table in production and setup BULK INSERT or BCP or whatever to report on, log, and save any bad rows for rework or to send back to the provider as feedback.  A lot of times, they don't know they have a problem.  If they do know and won't do anything about it, then I send them one email per bad row and make sure that it has a wide distribution in the company, if you know what I mean. 😀

    Typically I use tempdb for single table low volume staging. Sometimes I use permanent tables in the same database. Nowadays I tend to create a unique staging database. The advantage of permanent staging tables is that you have pre allocated dedicated space, which you can put on different drives if required, without relying on acquiring or putting pressure on tempdb space.

    As Jeff is fond of saying 'it depends'.

    Far away is close at hand in the images of elsewhere.
    Anon.

  • My guess would be that a document number should be an identifier. Identifiers are always measured on a nominal scale. Therefore, you document number should have been a string of digits. Maybe put in a check digit or some regular expression in a CHECK() to validate it. Integers, floats and other numeric values are used for computations.

    Please post DDL and follow ANSI/ISO standards when asking for help. 

  • jcelko212 32090 wrote:

    My guess would be that a document number should be an identifier. Identifiers are always measured on a nominal scale. Therefore, you document number should have been a string of digits. Maybe put in a check digit or some regular expression in a CHECK() to validate it. Integers, floats and other numeric values are used for computations.

    Yes, it is an identifier, readily consumed in the front end,  also forms part of the composite PK along with the Version (INT) column.

    To me, FLOAT always seemed a particularly odd choice for something that will only ever be a whole number. Unless in the early days, before we started using it (the product has undergone many changes of ownership, branding and technical changes) the version number was included in the  DocumentNumber column as a decimal part maybe.

     

    "Knowledge is of two kinds. We know a subject ourselves, or we know where we can find information upon it. When we enquire into any subject, the first thing we have to do is to know what books have treated of it. This leads us to look at catalogues, and at the backs of books in libraries."
    — Samuel Johnson
    I wonder, would the great Samuel Johnson have replaced that with "GIYF" now?

  • jcelko212 32090 wrote:

    My guess would be that a document number should be an identifier. Identifiers are always measured on a nominal scale. Therefore, you document number should have been a string of digits. Maybe put in a check digit or some regular expression in a CHECK() to validate it. Integers, floats and other numeric values are used for computations.

    Lordy, Joe.  You can have "nominal scale" identifiers that are integer based and don't have the multiple penalties that character based identifiers suffer.  Your criteria that identifiers are not used in calculations (which you've not stated here but have before) is incorrect.  While  people may not use them directly, SQL Server can use them better as integer values rather than converting integers to character based stuff.  Don't forget that the Clustered Index key(s) are added to every Non Clustered Index and so the more narrow the key is, the more memory, disk space, etc, etc, will be saved and the more rows can be stored per page which usually means queries will run faster.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • David Burrows wrote:

    Nowadays I tend to create a unique staging database.

    I do  that same thing a lot, myself.  The cool part about that is is that it can be set to the SIMPLE recovery model (like TempDB is) to take serious advantage of the performance gains of "Minimal Logging" due to much less logging in the Transaction File and also save on backup space and duration.  It also helps with restores... Because there should be nothing in a staging that can't be regenerated, there's no sense in having them in a database that might have to someday be restored.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply