REPLACE using CHAR vs NCHAR

  • I have an interesting issue that I am hoping someone could explain or otherwise help me diagnose. Suppose I have the following:

    SELECT [UnicodeLower] = REPLACE(N'aerostar Inc',NCHAR(230),N'ae(lower)'),

    [ASCIILower] = REPLACE('aerostar Inc',CHAR(230),'ae(lower)'),

    [UnicodeUpper] = REPLACE(N'AEROSTAR INC',NCHAR(230),N'AE(upper)'),

    [ASCIIUpper] = REPLACE('AEROSTAR INC',CHAR(230),'AE(upper)'),

    [UnicodeMixed] = REPLACE(N'Aerostar Inc',NCHAR(230),N'Ae(mixed)'),

    [ASCIIMixed] = REPLACE('Aerostar Inc',CHAR(230),'Ae(mixed)')

    And receive the following results:

    UnicodeLower,ASCIILower,UnicodeUpper,ASCIIUpper,UnicodeMixed,ASCIIMixed

    ae(lower)rostar Inc,aerostar Inc,AE(upper)ROSTAR INC,AEROSTAR INC,Ae(mixed)rostar Inc,Aerostar Inc

    Why am I getting different results, or I should say, why is the Unicode version replacing the lower case "ae" when that is NOT what nchar(230) is? The ASCII version works as expected. I should add that whether it is NCHAR/CHAR(230) [lowercase æ] or NCHAR/CHAR(198) [upper case Æ], I get similar results, even if I add a case-sensitive collation in the REPLACE function.

    [font="Tahoma"]Zycon - The Search Engine for Manufacturers[/url]. Helping Engineers and Technical Buyers Locate Aluminum Extrusions, Metal Stampers, Plastic Fabricators, Valves, and More.[/font]

  • Matthew Lehn (10/3/2011)


    I have an interesting issue that I am hoping someone could explain or otherwise help me diagnose. Suppose I have the following:

    SELECT [UnicodeLower] = REPLACE(N'aerostar Inc',NCHAR(230),N'ae(lower)'),

    [ASCIILower] = REPLACE('aerostar Inc',CHAR(230),'ae(lower)'),

    [UnicodeUpper] = REPLACE(N'AEROSTAR INC',NCHAR(230),N'AE(upper)'),

    [ASCIIUpper] = REPLACE('AEROSTAR INC',CHAR(230),'AE(upper)'),

    [UnicodeMixed] = REPLACE(N'Aerostar Inc',NCHAR(230),N'Ae(mixed)'),

    [ASCIIMixed] = REPLACE('Aerostar Inc',CHAR(230),'Ae(mixed)')

    And receive the following results:

    UnicodeLower,ASCIILower,UnicodeUpper,ASCIIUpper,UnicodeMixed,ASCIIMixed

    ae(lower)rostar Inc,aerostar Inc,AE(upper)ROSTAR INC,AEROSTAR INC,Ae(mixed)rostar Inc,Aerostar Inc

    Why am I getting different results, or I should say, why is the Unicode version replacing the lower case "ae" when that is NOT what nchar(230) is? The ASCII version works as expected. I should add that whether it is NCHAR/CHAR(230) [lowercase æ] or NCHAR/CHAR(198) [upper case Æ], I get similar results, even if I add a case-sensitive collation in the REPLACE function.

    I could be off here, but I am pretty sure that the database collation has to be turned on to case sensitive for you to see it in the replace, otherwise sql server is case insensitive.

  • You can actually put a collation into the replace function, i.e.

    SELECT test = REPLACE('Lock' COLLATE SQL_Latin1_General_CP1_CS_AS,'l','kn')

    which in this example will return "Lock" since I set it to case-sensitive. This actually overwrites any collation set in the database itself.

    The problem is not with collation per se, but rather between the differences in running the REPLACE function on a Unicode (N[VAR]CHAR) compared to an ASCII/ANSI ([VAR]CHAR) string.

    [font="Tahoma"]Zycon - The Search Engine for Manufacturers[/url]. Helping Engineers and Technical Buyers Locate Aluminum Extrusions, Metal Stampers, Plastic Fabricators, Valves, and More.[/font]

  • Matthew Lehn (10/3/2011)


    You can actually put a collation into the replace function, i.e.

    SELECT test = REPLACE('Lock' COLLATE SQL_Latin1_General_CP1_CS_AS,'l','kn')

    which in this example will return "Lock" since I set it to case-sensitive. This actually overwrites any collation set in the database itself.

    The problem is not with collation per se, but rather between the differences in running the REPLACE function on a Unicode (N[VAR]CHAR) compared to an ASCII/ANSI ([VAR]CHAR) string.

    It appears to me to be perhaps a bug in sql server. I tried this test:

    declare @varchar varchar(100)

    set @varchar = 'aerostar Inc '+char(230)

    select Charindex(char(230),@varchar)

    charindex returns postion 14 as expected

    declare @nvarchar nvarchar(100)

    set @nvarchar = 'aerostar Inc '+nchar(230)

    select Charindex(char(230),@nvarchar)

    charindex returns position 1, which is not expected.

    Seems like a bug to me.

  • Sure does seem like it's a bug. "æ" is NOT the same as "ae" nor is "Æ" the same as "AE". Why SQL Server thinks so when using Unicode is a mystery.

    [font="Tahoma"]Zycon - The Search Engine for Manufacturers[/url]. Helping Engineers and Technical Buyers Locate Aluminum Extrusions, Metal Stampers, Plastic Fabricators, Valves, and More.[/font]

  • I cannot tell you what is going on here but I ran into the same problem and was able to fix it using this collation: Danish_Norwegian_CS_AS. As soon as I adjusted the column to that collation and the column was set to nvarchar(50), SQL Server was differentiating between ascii 230 and "ae". Hope it helps.

  • Something else you need to be aware of that some of us found out while testing code in one of my articles... certain collations will make your server run horribly slow compared to other collations. IIRC, there's a CONNECT item on the problem. I've not actually looked for that CONNECT item because I use the current default collation that SQL Server ships with and that collation doesn't have the problem. To the best of my knowledge, there isn't a publicly available list of collations that cause the performance problem.

    Bottom line is, setup at least one machine with the default collation in place so you can test any machine that you change the system-wide collation on against.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply