function/code ??

Question

Post reply

function/code ??

gravitysucks

Ten Centuries

Points: 1122
More actions
April 23, 2012 at 6:29 pm

#260302

How to I create a function/code within sql to convert non english unicode (chinese/korean...) characters to double byte ANSI characters??

Viewing 15 posts - 1 through 15 (of 36 total)

You must be logged in to reply to this topic. Login to reply

MarkusB SSC-Dedicated Points: 37370 More actions · Answer 1

MarkusB

SSC-Dedicated

Points: 37370

April 24, 2012 at 1:54 am

#1477947

Just use the COLLATE function.

http://msdn.microsoft.com/en-us/library/ms184391(v=sql.105).aspx

[font="Verdana"]Markus Bohse[/font]

Orlando Colamatteo SSC Guru Points: 182276 More actions · Answer 2

What is the data encoded in to start with? If they are truly multi-byte characters, i.e. Unicode on something other than plane-0, then you cannot store those in SQL Server lest you are OK with giving away usefulness of things like proper results from the LEN() function. SQL 2012 offers better support with multi-character collations.

What are you trying to do? Is you data already in SQL Server in an NVARCHAR or NCHAR column?

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

gravitysucks Ten Centuries Points: 1122 More actions · Answer 3

opc.three (4/24/2012)

What is the data encoded in to start with? If they are truly multi-byte characters, i.e. Unicode on something other than plane-0, then you cannot store those in SQL Server lest you are OK with giving away usefulness of things like proper results from the LEN() function. SQL 2012 offers better support with multi-character collations.

What are you trying to do? Is you data already in SQL Server in an NVARCHAR or NCHAR column?

I have a column which has non english letters (Nvarchar) Now my understanding is that all of these non English unicode letters can has ANSI counterpart that can be represented in double bytes. I just don''t know how to achieve that?

Lynn Pettis SSC Guru Points: 442447 More actions · Answer 4

Lynn Pettis

SSC Guru

Points: 442447

April 24, 2012 at 2:21 pm

#1478542

May I ask we you need to do this?

Orlando Colamatteo SSC Guru Points: 182276 More actions · Answer 5

MisLead (4/24/2012)

opc.three (4/24/2012)

What is the data encoded in to start with? If they are truly multi-byte characters, i.e. Unicode on something other than plane-0, then you cannot store those in SQL Server lest you are OK with giving away usefulness of things like proper results from the LEN() function. SQL 2012 offers better support with multi-character collations.

What are you trying to do? Is you data already in SQL Server in an NVARCHAR or NCHAR column?

I have a column which has non english letters (Nvarchar) Now my understanding is that all of these non English unicode letters can has ANSI counterpart that can be represented in double bytes. I just don''t know how to achieve that?

Every VARCHAR character consists of 1 byte with a value between 0-255 (i.e. 256 possible values).

Every NVARCHAR character consists of 2 bytes with a value between 0-65536 (256*256).

How do you want the output? Two separate bytes?

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

gravitysucks Ten Centuries Points: 1122 More actions · Answer 6

opc.three (4/24/2012)

MisLead (4/24/2012)

opc.three (4/24/2012)

What is the data encoded in to start with? If they are truly multi-byte characters, i.e. Unicode on something other than plane-0, then you cannot store those in SQL Server lest you are OK with giving away usefulness of things like proper results from the LEN() function. SQL 2012 offers better support with multi-character collations.

What are you trying to do? Is you data already in SQL Server in an NVARCHAR or NCHAR column?

I have a column which has non english letters (Nvarchar) Now my understanding is that all of these non English unicode letters can has ANSI counterpart that can be represented in double bytes. I just don''t know how to achieve that?

Every VARCHAR character consists of 1 byte with a value between 0-255 (i.e. 256 possible values).

Every NVARCHAR character consists of 2 bytes with a value between 0-65536 (256*256).

How do you want the output? Two separate bytes?

Yes. Combination of two bytes varchar.

Please see the attachment how Japanese characters are converted into double bytes (ansi within 255) in the column 7 and 8. That is my expected output.

Orlando Colamatteo SSC Guru Points: 182276 More actions · Answer 7

MisLead (4/24/2012)

opc.three (4/24/2012)

MisLead (4/24/2012)

opc.three (4/24/2012)

What is the data encoded in to start with? If they are truly multi-byte characters, i.e. Unicode on something other than plane-0, then you cannot store those in SQL Server lest you are OK with giving away usefulness of things like proper results from the LEN() function. SQL 2012 offers better support with multi-character collations.

What are you trying to do? Is you data already in SQL Server in an NVARCHAR or NCHAR column?

I have a column which has non english letters (Nvarchar) Now my understanding is that all of these non English unicode letters can has ANSI counterpart that can be represented in double bytes. I just don''t know how to achieve that?

Every VARCHAR character consists of 1 byte with a value between 0-255 (i.e. 256 possible values).

Every NVARCHAR character consists of 2 bytes with a value between 0-65536 (256*256).

How do you want the output? Two separate bytes?

Yes. Combination of two bytes varchar.

Please see the attachment how Japanese characters are converted into double bytes (ansi within 255) in the column 7 and 8. That is my expected output.

OK...I think I understand now. Let me know if this is a correct example:

1. Say I have a table with a NCHAR(1) column and that I store a row with a two-byte Unicode character.

2. When I deliver the data to the caller it needs to deliver a two-byte varchar, like the last column in this:

CREATE TABLE dbo.fake_table



(

 fake_character NCHAR(1)

)

INSERT  INTO dbo.fake_table

        (fake_character)

VALUES  (NCHAR(10003));

SELECT  fake_character,

        CAST(fake_character AS BINARY(2)) AS nchar_as_binary,

        CAST(0x13 AS CHAR(1)) AS nchar_byte_1_as_ascii,

        CAST(0x27 AS CHAR(1)) AS nchar_byte_2_as_ascii,

        CAST(0x13 AS CHAR(1)) + CAST(0x27 AS CHAR(1)) AS nchar_byte_to_legacy_caller

FROM    dbo.fake_table

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

gravitysucks Ten Centuries Points: 1122 More actions · Answer 8

opc.three (4/24/2012)

MisLead (4/24/2012)

opc.three (4/24/2012)

MisLead (4/24/2012)

opc.three (4/24/2012)

What is the data encoded in to start with? If they are truly multi-byte characters, i.e. Unicode on something other than plane-0, then you cannot store those in SQL Server lest you are OK with giving away usefulness of things like proper results from the LEN() function. SQL 2012 offers better support with multi-character collations.

What are you trying to do? Is you data already in SQL Server in an NVARCHAR or NCHAR column?

I have a column which has non english letters (Nvarchar) Now my understanding is that all of these non English unicode letters can has ANSI counterpart that can be represented in double bytes. I just don''t know how to achieve that?

Every VARCHAR character consists of 1 byte with a value between 0-255 (i.e. 256 possible values).

Every NVARCHAR character consists of 2 bytes with a value between 0-65536 (256*256).

How do you want the output? Two separate bytes?

Yes. Combination of two bytes varchar.

Please see the attachment how Japanese characters are converted into double bytes (ansi within 255) in the column 7 and 8. That is my expected output.

OK...I think I understand now. Let me know if this is a correct example:

1. Say I have a table with a NCHAR(1) column and that I store a row with a two-byte Unicode character.

2. When I deliver the data to the caller it needs to deliver a two-byte varchar, like the last column in this:

CREATE TABLE dbo.fake_table
( fake_character NCHAR(1) ) INSERT INTO dbo.fake_table (fake_character) VALUES (NCHAR(10003)); SELECT fake_character, CAST(fake_character AS BINARY(2)) AS nchar_as_binary, CAST(0x13 AS CHAR(1)) AS nchar_byte_1_as_ascii, CAST(0x27 AS CHAR(1)) AS nchar_byte_2_as_ascii, CAST(0x13 AS CHAR(1)) + CAST(0x27 AS CHAR(1)) AS nchar_byte_to_legacy_caller FROM dbo.fake_table

Thank you but this code doesn't really help. It gives the same output for different character sets.I want to be able to input chinese/japanese letters and get is equivalent ansi double byte as shown in the attached.

Orlando Colamatteo SSC Guru Points: 182276 More actions · Answer 9

MisLead (4/24/2012)

opc.three (4/24/2012)

MisLead (4/24/2012)

opc.three (4/24/2012)

MisLead (4/24/2012)

opc.three (4/24/2012)

What is the data encoded in to start with? If they are truly multi-byte characters, i.e. Unicode on something other than plane-0, then you cannot store those in SQL Server lest you are OK with giving away usefulness of things like proper results from the LEN() function. SQL 2012 offers better support with multi-character collations.

What are you trying to do? Is you data already in SQL Server in an NVARCHAR or NCHAR column?

I have a column which has non english letters (Nvarchar) Now my understanding is that all of these non English unicode letters can has ANSI counterpart that can be represented in double bytes. I just don''t know how to achieve that?

Every VARCHAR character consists of 1 byte with a value between 0-255 (i.e. 256 possible values).

Every NVARCHAR character consists of 2 bytes with a value between 0-65536 (256*256).

How do you want the output? Two separate bytes?

Yes. Combination of two bytes varchar.

Please see the attachment how Japanese characters are converted into double bytes (ansi within 255) in the column 7 and 8. That is my expected output.

OK...I think I understand now. Let me know if this is a correct example:

1. Say I have a table with a NCHAR(1) column and that I store a row with a two-byte Unicode character.

2. When I deliver the data to the caller it needs to deliver a two-byte varchar, like the last column in this:

CREATE TABLE dbo.fake_table
( fake_character NCHAR(1) ) INSERT INTO dbo.fake_table (fake_character) VALUES (NCHAR(10003)); SELECT fake_character, CAST(fake_character AS BINARY(2)) AS nchar_as_binary, CAST(0x13 AS CHAR(1)) AS nchar_byte_1_as_ascii, CAST(0x27 AS CHAR(1)) AS nchar_byte_2_as_ascii, CAST(0x13 AS CHAR(1)) + CAST(0x27 AS CHAR(1)) AS nchar_byte_to_legacy_caller FROM dbo.fake_table

Thank you but this code doesn't really help. It gives the same output for different character sets.I want to be able to input chinese/japanese letters and get is equivalent ansi double byte as shown in the attached.

That's because it's an example and the hex values are hardcoded into the SELECT column list...still trying to nail you down on what your requirements are...

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

gravitysucks Ten Centuries Points: 1122 More actions · Answer 10

opc.three (4/24/2012)

MisLead (4/24/2012)

opc.three (4/24/2012)

MisLead (4/24/2012)

opc.three (4/24/2012)

MisLead (4/24/2012)

opc.three (4/24/2012)

What is the data encoded in to start with? If they are truly multi-byte characters, i.e. Unicode on something other than plane-0, then you cannot store those in SQL Server lest you are OK with giving away usefulness of things like proper results from the LEN() function. SQL 2012 offers better support with multi-character collations.

What are you trying to do? Is you data already in SQL Server in an NVARCHAR or NCHAR column?

I have a column which has non english letters (Nvarchar) Now my understanding is that all of these non English unicode letters can has ANSI counterpart that can be represented in double bytes. I just don''t know how to achieve that?

Every VARCHAR character consists of 1 byte with a value between 0-255 (i.e. 256 possible values).

Every NVARCHAR character consists of 2 bytes with a value between 0-65536 (256*256).

How do you want the output? Two separate bytes?

Yes. Combination of two bytes varchar.

Please see the attachment how Japanese characters are converted into double bytes (ansi within 255) in the column 7 and 8. That is my expected output.

OK...I think I understand now. Let me know if this is a correct example:

1. Say I have a table with a NCHAR(1) column and that I store a row with a two-byte Unicode character.

2. When I deliver the data to the caller it needs to deliver a two-byte varchar, like the last column in this:

CREATE TABLE dbo.fake_table
( fake_character NCHAR(1) ) INSERT INTO dbo.fake_table (fake_character) VALUES (NCHAR(10003)); SELECT fake_character, CAST(fake_character AS BINARY(2)) AS nchar_as_binary, CAST(0x13 AS CHAR(1)) AS nchar_byte_1_as_ascii, CAST(0x27 AS CHAR(1)) AS nchar_byte_2_as_ascii, CAST(0x13 AS CHAR(1)) + CAST(0x27 AS CHAR(1)) AS nchar_byte_to_legacy_caller FROM dbo.fake_table

Thank you but this code doesn't really help. It gives the same output for different character sets.I want to be able to input chinese/japanese letters and get is equivalent ansi double byte as shown in the attached.

That's because it's an example and the hex values are hardcoded into the SELECT column list...still trying to nail you down on what your requirements are...

Oh Ok. I don't think that serves the purpose then.Let me try to explain you better of the requirement because I think you know this thing well.

Input

???(JAPANESE)--------------------SQL FUNCTION/CODE---------------- Ÿ§œ (double byte ANSI)

????(THAI)---------------------------SQL FUNCTION/CODE------------------ À¿Ð ( double byte ANSI)

and so on. Ofcourse this is just an example and not necessarily an exact conversion. But this should paint a perfect picture of what my requierment is. The input could japanese/thai/chinese...and so on.

Orlando Colamatteo SSC Guru Points: 182276 More actions · Answer 11

MisLead (4/24/2012)

....

Oh Ok. I don't think that serves the purpose then.Let me try to explain you better of the requirement because I think you know this thing well.

Input

???(JAPANESE)--------------------SQL FUNCTION/CODE---------------- Ÿ§œ (double byte ANSI)

????(THAI)---------------------------SQL FUNCTION/CODE------------------ À¿Ð ( double byte ANSI)

and so on. Ofcourse this is just an example and not necessarily an exact conversion. But this should paint a perfect picture of what my requierment is. The input could japanese/thai/chinese...and so on.

I wish I could say that cleared things up...but it did not. You continue to use the phrase "double byte ANSI" but are not explaining what that means. I have Googled it, but it does not seem to be a documented technical concept. You are showing the same number of "double byte ANSI" characters as multi-byte Japanese characters but as we know ASCII must fit into 1 byte, while your Japanese character will require 2 for proper representation. I keep coming back to how could ? possibly equal Ÿ when the first comes from a set of ~65K characters and the second from a set of 256? I'll step aside for now...this one must be beyond me. Sorry.

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

gravitysucks Ten Centuries Points: 1122 More actions · Answer 12

gravitysucks

Ten Centuries

Points: 1122

April 24, 2012 at 4:29 pm

#1478609

Please refer this. Its exactly what i mean-if you still have some time

http://msdn.microsoft.com/en-us/library/cc194789.aspx

Orlando Colamatteo SSC Guru Points: 182276 More actions · Answer 13

MisLead (4/24/2012)

Please refer this. Its exactly what i mean-if you still have some time

http://msdn.microsoft.com/en-us/library/cc194789.aspx

Yes, Windows uses a double-byte character set internally, with encoding UCS-2...which is the same encoding SQL Server uses underneath its NVARCHAR and NCHAR data types. There is a synergy there when it comes to performance and ease of use...which is by no means an accident.

You still have not explained how ? could possibly equal Ÿ in any type of conversion scenario, i.e. in your mind how could one logically get back and forth from one to the other when one character must be represented using 2 bytes and the other only 1 byte? That is where you are losing me.

I could see if you wanted to tear apart ? into two separate bytes so some legacy program could process it without losing it's mind...but I need to know what that legacy program needs to see and the mapping-example you provided is not making sense to me.

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

gravitysucks Ten Centuries Points: 1122 More actions · Answer 14

opc.three (4/24/2012)

MisLead (4/24/2012)

Please refer this. Its exactly what i mean-if you still have some time

http://msdn.microsoft.com/en-us/library/cc194789.aspx

Yes, Windows uses a double-byte character set internally, with encoding UCS-2...which is the same encoding SQL Server uses underneath its NVARCHAR and NCHAR data types. There is a synergy there when it comes to performance and ease of use...which is by no means an accident.

You still have not explained how ? could possibly equal Ÿ in any type of conversion scenario, i.e. in your mind how could one logically get back and forth from one to the other when one character must be represented using 2 bytes and the other only 1 byte? That is where you are losing me.

I could see if you wanted to tear apart ? into two separate bytes so some legacy program could process it without losing it's mind...but I need to know what that legacy program needs to see and the mapping-example you provided is not making sense to me.

My apologies. As mentioned in the post,it was just an example and not an exact conversion. To my knowledge "Ÿ" is an extended ASCII character set of 8 bits(1 byte). Now "?" is a unicode and cannot be contained within 1 byte and requires two bytes hence it would need one more of another ASCII extended character lets say "Œ".

so now ? equals "ŸŒ" which is two bytes. this is like i said not necessarily an exact conversion. So i think each character of these will have two bytes hence double bytes.I just need a conversion function. Please.