General Questions - Globalization and Unicode

Question

Post reply

General Questions - Globalization and Unicode

James Beaty-208368

SSC Rookie

Points: 49
More actions
July 25, 2005 at 6:34 pm

#165432

I'd like to ask some of my colleagues what critical areas should be considered when the question of "Unicode/Globalization" comes up. For those of you who have successfully converted your systems to Unicode, I salute you!

I have a few questions for those of you who have been through this. Any help would, naturally, be appreciated since we DBAs are very busy.

Are there any "gotchas" when we go to convert our char, varchar, etc., columns to "nchar", "nvarchar", and so on. What are the general methods some of you have used to convert these column types? Via EM and change the column attribute?-- or -- will this come back and "bite me" in some way? (for example, when I change from char to nchar, will SQL Server alter the contents of the field in any way that I should be aware of?) Or, should a different method be used?
July 28, 2005 at 8:00 am

This was removed by the editor as SPAM

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

Ray M SSC-Insane Points: 21093 More actions · Answer 1

Ray M

SSC-Insane

Points: 21093

July 28, 2005 at 2:32 pm

#578013

There's no gotcha's in that way.

The only gotcha is if you have used the DATALENGTH function on any strings to test the lengh, this presents an integer in Bytes. Otherwise, you should be Okay, I havn't seen any wierdness.

James Beaty-208368 SSC Rookie Points: 49 More actions · Answer 2

James Beaty-208368

SSC Rookie

Points: 49

July 28, 2005 at 4:57 pm

#578038

Thanks!. That helps clear up a few answers I had. Thanks for your post! I do appreciate you taking time from your schedule to jot down a couple of notes! It has been a fast week as I'm sure all will agree.

Perry-300990 Default port Points: 1401 More actions · Answer 3

(Post-scriptum caveat: I'm far from an expert. I'm just reporting some simple things of which I myself was not aware some years ago, in case they are of any help to others.)

Things I'd suggest.

* Be aware of the limitations of UCS-2LE

My guess is this only affects GB18030 Chinese.

* Realize that string length is not the same as byte length.

Look up "composing characters" in Unicode if you're not familiar with them.

* Realize that display text width is much more complex than character length (due to non-spacing characters).

* Realize that string equality requires considering how you want to handle "composing characters".

* Read up on the subject of UTF-8 encoding and minimal encoding and the security implications.

* Realize that there are a few corner cases preventing uppercasing and lowercasing from being idempotent (eg, Greek sigma, Turkish i, German SS).

* Review caret & cursor behavior for right-to-left languages, and for the case of mixing, say, Arabic and French (ie, mixing RTL and LTR text).

Gift Peddie SSC Guru Points: 73570 More actions · Answer 4

In .NET for string equality implement the ICOMPARER interface on the application layer unlike the ICOMPARABLE interface it let different types to be compared. UCS-2LE uses less space and UTF8 is almost ASCII under the cover. Try the thread below I explained Chinese specific Unicode in SQL Server on another forum. Hope this helps.

http://forums.asp.net/1067798/ShowPost.aspx

Kind regards,

Gift Peddie

Kind regards,
Gift Peddie

Perry-300990 Default port Points: 1401 More actions · Answer 5

But, on that thread you said "Dictionary order" for Chinese.

Chinese dictionaries use several different orders, in my experience:

First by radical, and then by stroke count

(I think this is the most common order?)

But I suppose this must be subdivided into two variants, based on

whether traditional or simplified stroke counts are used

By stroke count

Again, I guess there are two variants of this

Phonetic by pinyin Latin alphabetization

Phonetic by bopomofo order

When you say "dictionary order", do you mean "by radical then by stroke, using simplified stroke counts" ?

Gift Peddie SSC Guru Points: 73570 More actions · Answer 6

Perry,

Those codes are straight out of SQL Server 2000 BOL and I was interacting with someone who is Chinese and it helped resolve his problems, I cannot go in detail with you now because in my current project my RDBMS(relational database management systems) is Oracle 10g 64bits. I am still in Design stage but it will be deployed in at least 33 countries.

Kind regards,

Gift Peddie

Kind regards,
Gift Peddie

Perry-300990 Default port Points: 1401 More actions · Answer 7

If that is a subtle way of saying that you don't know, that's fine -- I'm only an amateur myself, in either Chinese language or g10n, and just posting because noone else seems to (plus, I had a real question about corruption, and I like to try to answer some other peoples' questions when I ask one of my own, as a kind of feeling of fairness).

Gift Peddie SSC Guru Points: 73570 More actions · Answer 8

Perry,

No that is not my way of telling you I don't know because I know sorting require applying the equality operator to types and SQL Server 2000 BOL (books online) says the best is Binary but SQL Server must be case sensitive. I am not an academic I only know what works there are six different Chinese sort in SQL Server you are running SQL Server I am not so you run some test. Richard the person I was helping is of Chinese decent and he said it is based on pronunciation, so I would assume Dictionary is based on the 2000 plus Chinese alphabet. The point is he said thanks and did not come back which means his problem was solved.

Kind regards,

Gift Peddie

Kind regards,
Gift Peddie

Perry-300990 Default port Points: 1401 More actions · Answer 9

Sorry, I get it now -- you were just copying&pasting text from Books Online, and don't necessarily understand what you pasted -- so I needn't be asking you to explain what you posted.

I don't see that text "Dictionary Order" you quoted in my Books Online, but I do see some collations labelled stroke order, so it might be a poor way (by Microsoft, not you) to say that somewhere. Stroke order is not phonetic, but you've already explained that you don't use SQL Server, so it won't matter to you, so this is a nice fun pointless explanation, isn't it?

In case anyone ever does read this -- although I hope not -- I'll summarize:

My Books Online says:

202 Chinese_Taiwan_Stroke_CS_AS

so apparently 202 is collated on stroke order.

Gift Peddie SSC Guru Points: 73570 More actions · Answer 10

I am both MCSE and MCDBA certified and I am SQL Server expert so your comment about me not using SQL Server is not relevant. Richard is of Chinese decent I will take what he tell me about Chinese in SQL Server over what you say. And for people who will read just this page Windows code page is different and I have posted the SQL Server info in the link I provided. Here is that info.

196

Chinese_Taiwan_Stroke_BIN

197

Chinese_Taiwan_Stroke_CI_AS

198

Chinese_PRC_BIN

199

Chinese_PRC_CI_AS

200

Japanese_CS_AS

201

Korean_Wansung_CS_AS

202

Chinese_Taiwan_Stroke_CS_AS

203

Chinese_PRC_CS_AS

196

Binary order, for use with the 950 (Traditional Chinese) character set.

197

Dictionary order, case-insensitive, for use with the 950 (Traditional Chinese) character set.

198

Binary order, for use with the 936 (Simplified Chinese) character set.

199

Dictionary order, case-insensitive, for use with the 936 (Simplified Chinese) character set.

200

Dictionary order, case-sensitive, for use with the 932 (Japanese) character set.

201

Dictionary order, case-sensitive, for use with the 949 (Korean) character set.

202

Dictionary order, case-sensitive, for use with the 950 (Traditional Chinese) character set.

203

Dictionary order, case-sensitive, for use with the 936 (Simplified Chinese) character set.

Kind regards,

Gift Peddie

Kind regards,
Gift Peddie