Is Data Integrity really important or not

  • Why is the data integrity in database system very important?

    Do you really use CHECK CONSTRAINT and other mechanism like Trigger for enforcing data integrity at your projects?

  • _ms65g_ (7/13/2010)


    Why is the data integrity in database system very important?

    Because users tend to get very upset when applications return garbage information.

    Do you really use CHECK CONSTRAINT and other mechanism like Trigger for enforcing data integrity at your projects?

    Yes, and foreign key constraints

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Could you introduce an article about values of data integrity and enforcing and implementing it?

    And Do we must implement it with using DDL or we can implementing it at front end application?

  • _ms65g_ (7/13/2010)


    Why is the data integrity in database system very important?

    As a DBA you are responsible for: Database integrity, Recoverability and Performance - data integrity is part of your job description 🙂

    To better get familiar with the matter I would suggest to research Referential Integrity as well as Data Normalization - 3rd Normal Form a.k.a. 3NF would do.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • _ms65g_ (7/13/2010)


    Could you introduce an article about values of data integrity and enforcing and implementing it?

    And Do we must implement it with using DDL or we can implementing it at front end application?

    As per Codd's Rule #10 it has to be on the back end. Having said that I recognize some really high-end systems require creative approaches.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • You can implement it at the front end or back end. However, if you do it at the front end, then every time to make a call that touches data, it needs to check for RI.

    If you could enter a customer name from 3 or 4 places in the front, every one of them needs to check the customer name maintains RI with other tables it might link to.

    If someone builds a web service, or another application, or connects from Access/Excel, you have to ensure they do not break RI.

    Typically this is why it is implemented on the back end, often with constraints, less often with triggers.

    A few links:

    http://qa.sqlservercentral.com/articles/67327/

    http://qa.sqlservercentral.com/articles/1670/

  • Steve Jones - Editor (7/13/2010)


    Typically this is why it is implemented on the back end, often with constraints, less often with triggers.

    Agreed.

    PKs/FKs when enforcing RI in the -local- database.

    Triggers when in need of enforcing RI against another database.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • Because users won't follow the rules unless you force them to. They may even mean well. You may train all your users really, really well, but eventually they'll bring in new employees, and those employees will be trained by other employees, and then they might even train other new employees, until eventually your original plan is all but forgotten.

    Because users aren't perfect. They forget things, they rush things, they overlook things, and they produce typos.

    One of the worst culprits is almost always NULL fields. If they don't think it's relevant, they'll just leave it blank. But then what does NULL mean? Can you trust it's value, or is it being used subjectively. Maybe they forgot to put something in. Forcing them to put in a value means you don't have to ask "does NULL mean 0, or does it mean someone forgot to put something in there". 0 always means 0. NULL means a lot of things.

    We have two ERPs at the moment, and one of them does not enforce any sort of referential integrity on ship states. Not only does that mean we could potentially get a shipment sent back to us, it makes it real hard to prepare a sales report by state. A great deal of time was spent mapping invalid state values into correct ones, and nothing's there to stop them from doing it again. If you just give them a free-form field, they might put in "FL", "Fla", "Florida", "Flordia", or even NULL. In our case, we ended up with a lot of zip codes in the state field. It's a mess.

    Foreign key constraints can be done at the application level with drop-down lists, but not all data comes from the application. What if your SQL developer starts trying to push data into your database from some other system. Hopefully, he's doing it to the test database first, but his job becomes a lot easier if the system doesn't allow him to insert invalid data. The customer ID should be a value in the customer table, right? Why not inforce it?

    More than anything, referential integrity means peice of mind for the dba. It means the data you pull from the system is consistant and can be reasonably trusted. It also makes report building much easier when you don't have to worry about what NULL means, or why you have a state called "11001".

  • Excellent points, jvanderberg. Often RI is there to prevent accidental mistakes, not malicious ones.

  • I like to have very tight control of my data - with many constraints and triggers etc.

    but, saying that, I have built systems without any of the usual constraints

    because:

    - it simplified things

    - it increased performance levels

    - the probability of a constraint failing was microscopic

    and I never had reason to regret it

    if your design and permissions are tight there is little point expending precious cpu and io performing millions of:

    if (true) then do nothing

    Unless you like spending money on big servers

  • I'm afraid I have to disagree whole-heartedly. It doesn't take that many cycles to check a constraint. Further, those cycles don't really cost that much in the grand scope of things. Inconsistant, inaccurate data cost very real money, and sometimes a lot of it. It also costs a lot less to enforce data integrity early than to try and add it after the fact. An application that doesn't check constraints simply doesn't scale well, and almost everything I build ends up scaling to a grander scale that I ever imagined. A system meant for 5 advanced users could easily end up being used by 50 low-level users within just a few years. I wouldn't design a business application that wasn't meant to scale, because that's what business ultimately want to do - scale.

  • Ensuring the right results from your data is more important to most business users than either simplicity or performance. So if you sacrifice accuracy in the name of simplicity or performance you are doing those business stakeholders a disservice. There is an If here of course. It may be that you have found a more efficient way to implement integrity rules (putting them into applications for example) but as a general rule, implementing those rules in the database is the most cost effective and most performant way of ensuring data integrity.

    I'd be interested to know more about your scenario and why you think the probability of the constraint failing was "microscopic". Presumably that's because you can test your application(s) very well and manage the deployments of those applications in a highly controlled way? If you had dozens or hundreds of applications with many of them changing frequently then ensuring each one implements exactly the same rules at the same time could become an enormous and perhaps impossible task.

  • I'd be interested to know more about your scenario and why you think the probability of the constraint failing was "microscopic"

    plenty of validation at the application level

    plenty more at the stored procedure level

    inserts are highly unlikely to be invalid

    and updates / deletes only possible through tightly defined application interfaces and stored procedures

    plenty of healthy paranoia and defensive coding

    no point in:

    set [field] = 'STATUS'

    also performing:

    check ([field] in ('STATUS', 'BLAH1', 'BLAH2', 'BLAH3'))

    -- or --

    insert into

    ([username]) values ('bob')

    also checking:

    references dbo.Users([username])

    when 'bob' was selected from dbo.Users in the same transaction

    and users are only archived inside a tablockx

    there is 0% probability of 'bob' not existing in the Users table ...

    (unless server has run out of virtual memory then anything can happen)

  • doobya (7/24/2010)


    I'd be interested to know more about your scenario and why you think the probability of the constraint failing was "microscopic"

    plenty of validation at the application level

    plenty more at the stored procedure level

    Why would checking in the app or a proc be more efficient than checking in a constraint? In certain cases with CHECK using a small number of possible values it might be I suppose. It seems unlikely you could check UNIQUE or FOREIGN KEY constraints more efficiently in procedural code than with a declarative constraint though because you'll need extra queries or an extra round trip to do it. Even if you found a way to do it more efficiently, removing those constraints would probably be a bad idea because those same constraints can be used to produce more efficient execution plans for queries.

    I would agree that in some special cases with CHECK constraints where the quantity of data access code is quite small and tightly controlled then you might want to save a few processor cycles by doing some checks in the app or in a procedure. I'm not sure how that qualifies as a "simpler" method however. It seems like declarative CHECK constraints is about as simple as you can get and generally involves less code than any procedural equivalent.

  • because the proc had to select the user account to check the balance already

    the foreign key constraint was unnecessary

    a single check constraint is quick but that isn't a common scenario

    checking many check and foreign key constraints which will never return false

    is a waste of resources - if you are running servers at 80% that extra could

    push you over into scaling up - in that case looking at 5 figure sums coming out

    of my pocket!

    set @var = 1

    if @var <> 1 raiserror('no point in this line existing however cheap it is', 16, 1)

Viewing 15 posts - 1 through 15 (of 27 total)

You must be logged in to reply to this topic. Login to reply