Using BCP across different collations

  • Hi,

    I have to use BCP for copying data from a source database having collation SQL_Latin_General_CP1_CI_AS to target database having collation SQL_Latin_General_CP1_CI_AS.

    I am using BCP out with -n option & "-C raw" option

    BCP -in is done using using -n option & -C <code page> option.

    But after reading some articles on the net, I got confused whether I should really use the -n option or the -N option, as the BCP is between 2 databases of different collations.

    I was under the impression the "-C raw" & -C 950 (the code page for taiwanese collation) would do the trick.

    Is collation change the same as DBCS change ? I didnt find any difference in the database size before & after the collation change. If its changing single char to 2 bytes per char, it should increase in size, right ? Moreover, the fields in the table are using the data type varchar & not nvarchar.

    Can anyone please help me clear my doubts.

    Thanks a lot.

  • 1) The code page option allows you to specify the format of the output character type when you export from SQL Server, and input format when import to SQL. One example when you have to do this is if you're working with different platforms that use different native code pages, and Unicode is not supported either.

    Otherwise, stay simple and avoid the headaches, using option -w or -N for Unicode.

    2) Database size includes reserved free space for data growth. Unless the new added data exceeds the reserved space, which prompts SQL Server to extend the database file, you'd not see any size changes.

  • Hi,

    The data types used in the tables is varchar. So I understand that its not unicode supported. So I was thinking that I should be using -n option. Do you think I should be using the -N option instead ?

    Does the way in which the data is stored changed when the collation is changed or does it affect only the way of displaying it ?

    About the size difference... I did the collation change after shrinking the database to the maximum so that I could see any size difference. But The size didnt change which made me doubt about the way in which data is stored inside...

    Any idea why it is so ???

     

    Thanks in Advance.

  • I have no experience in dealing with taiwanese (or similar) collations, but when I need to move data from one server to another (eg ANSI to OEM), I just bcp out using the -c option, and then in again with the appropriate -C option (note difference in case).

    The -C<option> to use depends on how the source file looks (in ASCII) and to what collation that should be converted to.

    It is also possible to specify -C<option> when bcp:ing out if that will help getting it right in the end (when inserted into the destination server)

    Collation conversions can be tricky business, and also depending on source and destination collations, there may be a possibility that a 'pure' conversion cannot be done without loss of accuracy. In that case it may be better to do an intermediate step over unicode.

    /Kenneth

     

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply