Unicode -VS- Non-Unicode

  • At our shop I'm trying to put together some standards for SQL Server.  One of these standard (5.3 to be exact) specify that "NVARCHAR, NCHAR, and NTEXT" should not be used.  The justification is we are not a International company, and today we don't need to store any characters that can't be represented within a character set that only supports 256 characters.  I'm saying using unicode characters when it is not needed only wastes disk space (cheap now days), and causes extra I/O to read all the data.  Therefore using unicode is ineffectient in our environment.   Now on programmer says we should be using unicode characters for everything for the following reasons:

    Incorporating Unicode into client-server or multi-tiered applications and websites offers significant cost savings over the use of legacy character sets. Unicode enables a single software product or a single website to be targeted across multiple platforms, languages and countries without re-engineering. It allows data to be transported through many different systems without corruption.

    Now I'm trying to keep an open mind here.  So I'm appealing to the masses.  Should unicode be allowed in an enviroment even if it is not needed?  Or should we restrict its use. 

    Gregory A. Larsen, MVP

  • Unless you're planning to go international, I don't see why.

    However the programmer has a good point. If at some stage, you need to store japanese text, you'll have to do a big program rewrite. Which one costs the most compared to a bigger server??

    I'll be anxiously waiting other opinions on this one.

  • Here's something from one of my favourite SQL books from the chapter on database design...Robert Vieira ("SQL Server 2000 Programming") - (Normalization & other basic design issues):

    "Choose what you need, but ONLY what you need. eg. if you're trying to store months (as the number 1-12) - those can be done in a single byte by using a tinyint. Why then do I regularly come across databases where a field that's only going to store a month is declared as an int (which is 4 bytes) ? Don't use nchar or nvarchar if you're never going to do anything that required Unicode - these datatypes take up 2 bytes for every one as compared to their non-Unicode cousins.

    There is a tendency to think about this as a space issue - "Ah, disk space is cheap these days!" Well, beyond the notion that a name brand SCSI drive still costs more than I care to throw away on laziness, there's also a network bandwidth issue. If you're passing an extra 100 bytes down the wire for every row, and you pass a 100 record result, then that's about 10K worth of extra data you just clogged your network with - if you have 100 users performing 50 transactions per hour - that's over 50MB of wasted network bandwidth per hour."







    **ASCII stupid question, get a stupid ANSI !!!**

  • Thanks for the feedback.  I'm guessing that since no one else responded that everyone agrees with the idea of not to use unicode data types unless there is a real business need.  Thank you for the feedback.

    Gregory A. Larsen, MVP

  • On 32-bit processors, working with 32-bit integers is typically *faster* than working with bytes.

    To the original poster: "today we don't need to store any characters that can't be represented within a character set that only supports 256 characters". Well pointing out that 'tomorrow' is not 'today' is cheap, so I won't

    And can you really store everything in 256 characters - with one codepage? If you store people's names in your data, there are a LOT of 'weird' diacriticals that come up in names, and they aren't necessarily all found on the same codepage.

    Think about this: which will be easier at a later date, adding more storage / bandwidth, or changing data types throughout the whole system?

     

  • I would only Unicode the fields that need it - and there has to be some business case to say they do - the doubling of data space is an issue - not just at the database but also in returning that data to the front end. But of course if you know or reasonably expect a fiedl to need to hold weird names etc Unicode is the way but to be useful it requires a lot more than changing the field type as the fonrt end, searches, reports etc etc all need to expect unicode from the field too for it to be any use - changing a field to Unicode after the event is pretty easy actually changing applications etc is the difficult part

  • I am a big proponent of flexible policies. I would recommend keeping the non-unicode requirement but putting a provision where if the business case warrants it (and is approved by someone who understands the issues) a unicode field can be used. This will satisfy the programmers but forces them to actually think about what they are doing.

    -Nancy

  • It all depends on your business and the type of data you will be collecting. In our business, we have many instances where Unicode is necessary (nvarchar, nchar, and ntext) since we do international names, addresses and comments.

    However, in our design meetings and db model reviews, this discussion is a formal step in our process to question whether or not unicode is really necessary. Some of the things we talk about are the regions/countries where the application will be used, the languages involved in the interface, how the data will be reported and to whom, and similar topics.

    SJ

  • Thanks for all the comments.  If programmers do need to stored unicode, for the reasons you guys mentioned, we don't restrict them.  Although programmers need to have a business reason to use it.  The business reason can't be because maybe some day we might need it.  

    Gregory A. Larsen, MVP

  • Interested in an example?

    In my company we have a internal system (tracking the performance of employees). Everything was in unicode, department codes an so on. The system has an performance problem. Especially ad hoc queries (by management) returning many rows. Programmers dont want to change the unicode fields, because of the hours it wil cost. These problems would have been smaller if they had thought in advance of when to use unicode.

    Bye Gimli


    Kindest Regards,

    Gimli

    If you are a spineless creature, they can't break you

  • One could argue that if they had coded the app correctly in the first place (returning as little rows as possible), unicode or not, the problem wouldn't have been there.

  • In MY opinion unless the unicode fields involved are very large the performance impact should not be that big to the point that the system UNDER PERFORMS the specs. Usually DB Modeling and App desing has much more impact than Unicode. I believe that the use of it is very inline with how much global business does it is involved (or expected probability of being involved) with

    Just my $0.02

     


    * Noel

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply