Multiple languages in same column - how to sort, select

  • Hi all,

    We have a database where the nvarchar columns currently holding English only data. It keeps the training information. Currently the default collation is Latin Case insensitive accent insensitive.

    Now we are planning to allow multiple language support. When we go for it, we will upload the data from different languages. Based on user preferences, he/she should be able to query the data (One language at a time)

    I am worried on how the existing queries will work if I load all language data in the same database.

    For example, some characters in English are used in Norwegian too. But they have different sort order. ALso LIKE conditions too may fail.

    So far I thought of few solutions: Add the Collation information along with select , order by clauses. It means we need to add more procedures (one set per collation) The other option is to create new database for each language. Each will have its own collation. The dowside is we need more databases which may lead to more servers and more maintenance work.

    Please give your ideas

    Thanks in advance

    Cheers,
    Prithiviraj Kulasingham

    Plan to Test your Plan!

  • or add another column which will have code for language like 1 for english...

  • But then, how will you add sorting to the select query?

    Cheers,
    Prithiviraj Kulasingham

    Plan to Test your Plan!

  • when you load the data or when application loads the data you would know what language correct.. Use that for the new col and when querying they can use that col as well...

  • One option is to create a temp table with the proper collation and use that to sort. Another option is to cast the column as the appropriate collation. Do a search for collate on the following page:

    http://msdn.microsoft.com/en-us/library/ms187928.aspx?ppud=4

  • Hi all,

    Thank you for your response.

    I can identify the data with the collation. But my coding is going to be complex.

    PLease check the code

    Use tempdb;

    Create Table CollateTest

    (

    LineiD int identity not null primary key clustered,

    CollateText nvarchar(10)

    )

    insert into CollateTest Values (N'aæbc')

    insert into CollateTest Values (N'æabc')

    insert into CollateTest Values (N'baæc')

    insert into CollateTest Values (N'bæac')

    select * from CollateTest Order By CollateText COLLATE FRENCH_CI_AI

    select * from CollateTest Order By CollateText COLLATE Danish_Norwegian_CI_AI

    The results will be different. My problem is we access all data through stored procedures, we are trying to support many languages, it is impossible to write different procedures - one set per collation. collation cannot be set through a parameter.

    I understand based on accent or case the sort order should change. Is there any way to save the collation with the row? so that it can "automatically" sort in the proper order?

    Just another question: Is french æ and norwegian æ share the same unicode value?

    Cheers,
    Prithiviraj Kulasingham

    Plan to Test your Plan!

  • Did you consider using one column per language with the proper collation at the column level?

    Or a concept similar to sys.messages, where a language_id is stored to identify the related language? You could then use dynamic SQL to add the collation required.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Coloumn per language is too costly. Each time a language is adopted, we need to change the schema of all tables

    I am thinking on moving the sorting to application

    Other option is to create new database and set the collation for each language we adopt.

    This may be another thing Denali should consider: Passing the collation as a parameter (not only as literal)

    Cheers,
    Prithiviraj Kulasingham

    Plan to Test your Plan!

  • Dynamic query is a banned concept in my company.

    1. Monitoring performance is hard

    2. Security concerns

    3. Debugging is difficult.

    Cheers,
    Prithiviraj Kulasingham

    Plan to Test your Plan!

  • Preethiviraj Kulasingham (8/3/2011)


    Other option is to create new database and set the collation for each language we adopt.

    In general I'm not a fan of this idea but depending on how the application works it may work.

    Preethiviraj Kulasingham (7/30/2011)


    The results will be different. My problem is we access all data through stored procedures, we are trying to support many languages, it is impossible to write different procedures - one set per collation. collation cannot be set through a parameter.

    You can set up a series of if statements inside the stored procedure and take the language in as a parameter. Create a temp table inside the if statement based on the passed in language and then pull from the tempt able for the results.

  • There is a simple solution to do it correctly 🙂

    A collation encodes the rules governing the proper use of characters according to specific language. Using one specific collate in order to sort multiple languages data will leads to bad order for all other languages!

    In the following link you can see an article which showed how we can deal sorting multiple languages data using multiple collations (cultures) in the same query. In the article I used three simple solutions based on "SELF UNION" or "SELF JOIN" operations ('Self' mean that we JOIN or UNION a table to itself), or direct approach using ROW_NUMBER function.

    T-SQL: Sort data by multiple languages

    http://social.technet.microsoft.com/wiki/contents/articles/31194.t-sql-sort-data-by-multiple-languages.aspx

    I hope this is useful 🙂

    Senior consultant and architect, data platform and application development, Microsoft MVP.

  • Preethiviraj Kulasingham (8/3/2011)


    Dynamic query is a banned concept in my company.

    1. Monitoring performance is hard

    2. Security concerns

    3. Debugging is difficult.

    This sounds like a knee jerk reaction by management types who have no real idea regarding SQL Server. Banning a tool that has obvious value simply hamstrings those who are developing and supporting an application.

    1. Yes, monitoring can be difficult but not impossible.

    2. Yes, security is a concern but if done correctly it can be mitigated.

    3. Yes, it can be more difficult, but once working properly it doesn't need a lot more work.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply