SQL Hex Character conversion?!

  • Not really sure if thats the correct thread title, but basically, if the following surname is entered via our CRM

    Lukač

    When I select it in SQL I get

    Lukač

    What can I wrap around my surname field to get sql to return 'č' rather than '#269' ?!

    I looked at the undocumented function - sys.fn_varbintohexstr but I couldn't get it to behave!

  • I'm not sure I'm following you. Can you mock up an example that shows us what you are seeing? What datatype is your value being stored in ? What query is pulling the value back with the funny format?

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • I thought it might be confusing looking at it...

    See the 1.jpg - that is the exact data being entered, and the exact form it's entered in.

    If I basically select contact_firstname, contact_surname in SQL for this contact, I get the results in 2.jpg

    OK, so pictures might seem a strange way to post this - but you can see EXACTLY what I am doing!

  • What is your DB coalation set to? What about the regional settings for language for Windows on your client's machines?

    Just curious, what happens if you run this code:

    create table #tmpTable (Col1 varchar(10))

    insert into #tmpTable

    select 'Lukac'

    select * from #tmpTable

    drop table #tmpTable

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • John, I get the following result:

    Lukac

    I.e. missing the accent on the c but not having the & # business!

    The collation of the db is set to:

    SQL_Latin1_General_CP1_CI_AS

    The client machine in this case is just my machine, currently set to English, United Kingdom

  • It looks like a code page problem with your application. I'm not an expert in this so I don't know where to go from here. You may have to broaden your search and look around for others who are developing applications using language settings that support the caron symbol.

    I'm not sure if this helps, but here's at least something similar:

    http://www.webdeveloper.com/forum/showthread.php?t=171045

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • Thanks John Ill get our developers to look in to it and see if they can fathom anything out!!

  • peitech,

    from the pictures you posted I suspect that the hash could be generated outside SQL... what type is the form you are using to input data? Is that PHP? Check whether this form couldn't be altered in some way regarding the codepage it uses.

    However, even when you enter the data using a script, it is incorrect - accent is missing. I have no experience with the Latin collation, so I'm not sure whether this is the correct one. Wouldn't Croatian or Slovenian be more appropriate? And you said you have English system on the PC you used for testing... that also could cause some problems.

    Another thing, if you're on SQLS 2005, you'll be having problems with SQL collation sooner or later and should consider changing it to Windows collation (e.g. Latin1_General_CI_AS instead of SQL_Latin1_General_CP1_CI_AS).

    Collations with names beginning with "SQL_" cause table scans instead of index seek whenever there is a condition or ORDER BY on a character column... that means on large tables some queries need like 30 seconds instead of under 1 second to execute, not mentioning the huge load it generates on the server. SQL collations are practically useless on SQL2005.

  • Vladan,

    The information about the SQL collation is very interesting, Ill mention this to someone else and see what they make of it.

    The input data is in asp, soon to be rewritten in aspx I believe (it won't be me who's doing it, or who did it in the first place)

    The majority of entries in this table come up no problems and have no characters outside the 'standard' A-Z a-z ranges, its just for a small handful of records.

    It's looking more like I need to choose a different collation in SQL to get the accents etc to show up properly - I'll have to experiment with different ones at some point.

    How do I specify / convert the collation from its native collation to a new one when I am selecting the data out?

    i.e. if I wanted to get contact_surname in a different collation in the below:

    SELECT

    contact_id,

    contact_surname

    FROM tblcontact

  • I don't think it would make any sense to SELECT something as a certain collation... collation says, how the data is stored in the database with respect to ordering and case sensitivity.

    Find this in SQL Server 2005 Books Online : Collation and Code Page Architecture

    It describes how collation works and lists codepages supported by SQLS 2005. It is possible that you are using some old obscure codepage that is not supported anymore (historically, we had 3 different CP for Czech alphabet.. now we only use 1).

  • Vladan, I have realised that now - its simply the ASP file that 'renders' the '& # 256' type stuff in to characters, they are actually stored in sql in the '& # ... ' format, so changing the collation will not help.

    Longer term I think we need to move to a Windows collation, shorter term I need a different solution! - If only there was some kind of function that could interpret the '& #' business in SQL and replace them with characters I'd be laughing! - However, a test of a couple of functions and a Google search on the matter yielded no progress 🙁

  • Yes, that's what I supposed - that the form distorts characters before they get to SQL... it doesn't have anything to do with collation. You need to change the script so that the data is sent to SQL server in proper format.

    As to the entries that are already in the table with hashes, if it's doable, retype it manually. If there are lots of wrong entries, you could create a translation table (with 2 columns: code, letter), fill it with all possible character groups (&#number; and its conversion to character with accent) and write a replace query. However, be sure to do that on a test database first, since there could be some combinations that will not translate properly... well, maybe not in this case, but anyway.

    The replace query can be based on REPLACE function (I used "number" instead of 269, because 269 is automatically displayed as "c" by forum engine):

    SELECT REPLACE(column_name, '&#number;', 'c')

    FROM yourtable

    ... and I think there already are some posts about this on the forums. Sorry, I have no more time now, hope you'll find a way to solve the problem. Good luck!

  • Thanks for the code, there isn't too many entries which are a problem so I think a manual review of them will sort it out.

    As for preventing it... ill speak to our developer and see what he reckons on it.

  • Thats kind of the direction that I was taking here. This is outside of SQL and has something to do with the application, codepage, etc (which is also outside of my knowledge).

    Thanks for jumping in Vladan!!

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • This looks like an HTML or XML data conversion. Are you using either to post your queries to SQL?



    PeteK
    I have CDO. It's like OCD but all the letters are in alphabetical order... as they should be.

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply