Should Text column be converted to varchar(8000)???

  • I am after a quick fix here as the system in question is soon to be replaced, but for now;

    Our CRM system has a customer history log, which as well as storing automated events data can also be used by our employees to store free text. This free-text has been chosen to track some of our employee activity, whereby they enter a code at the beginining of the string i.e. "T:1234 blah blah blah", the T:1234 identifies the type of activity and the user's name is appended automatically to the end of the string.

    In their wisdom, the original designer of this system decided to implement the "HistoryText" column as data type "Text"...Even though after 5 years in production the longest text entry has been 5,500 characters - easily within the limit of the varchar variable. The average data length of the column is in the region of 100 characters.

    I need now to report based on the manually entered "T:...." strings and it is painfully slow in it's present state. After a brief trial with FTS I have abandoned this idea as the issue of noise rules out any sensible tracking code scheme I may invent.

    After reading much about the various ways of handling Text data types I am more in favour of changing the data type of the column to varchar(8000) and adding an index of this column on the reporting server...

    Any thoughts spring to mind as to why I should or should not follow this course of action?

    Info: the table in question has 12 million rows+, and the production db is replicated to the reporting server and log-shipped to a DR server (as such larger transaction log activity is not ideal!).

    Many thanks in advance for any input!

  • Obviously any change to this table will log a lot of activity, as well as possibly require restarting replication with a new snapshot.  You are correct that this table would work better if it had originally been set up as varchar(8000), or at least with the text-in-row option turned on, but redesigning it now will be painfull.

    If you want to report on the text fields without a massive table rebuild, you might consider copying the text fields to a new table with a varchar(8000) field.  If you're just reporting on the T:xxxx codes at the beginning of each field, then you only need a char(6) field.  Copying to another table would avoid any impact on replication, and if you put it in a separate database it wouldn't be log-shipped either.

  • I sympathize with you over the slow performance of text - we use a Third Party database that grinds exceeding slow when you select the text column.

    All that advice looks spot on, particularly indexing small key column.

    Gotchas to watch for would the 2-bytes per character load if you use[d] nvarchar instead of varchar, and the maximum bytes per row if you do add 5500-byte column entries to existing rows.

  • Not much to add to what has already been said. In my opinion, if a Text column is not trully needed or it is determined that it can be lived without, don't use it. Convert it to a varchar or nvarchar column ASAP.

    You won't be able to alter the column. You'll have to add the column, update the data, and then drop the old column.


    My blog: SQL Soldier[/url]
    SQL Server Best Practices:
    SQL Server Best Practices
    Twitter: @SQLSoldier
    My book: Pro SQL Server 2008 Mirroring[/url]
    Microsoft Certified Master: SQL Server, Data Platform MVP
    Database Engineer at BlueMountain Capital Management[/url]

  • Many thanks for the reassuring replies - I am about to convert the text column to a varchar - it could be a long night!!!

     

  • The table would be a better design if you put the "T code" and user name as separate columns, ideally integer foreign keys to lookup tables for T Code and user.  Indexing these would be a fairly trivial issue.

    I would also convert the text column to varchar, as long as I knew that the application was not using any of the special commands for TEXT, like UPDATETEXT.  If is is, you should discover that fairly quickly if you do that to a live table.

     

     

     

  • An update for all those who were so kind as to contribute....

    I converted said column to varchar(8000) on Sunday night and it was fairly pain-free, although 2am is a bit late to be hitting the sack when you have 2 kids and an early start planned...As the table was fairly large (12 million rows and growing) I thought there were a few points that may be of interest;

    I was expecting large log activity so disabled:

    Log-shipping

    Transactional Replication

    Log-backups

    Enterprise manager "Design Table" dialogue provides a useful "Save Change Script" feature; make a modification to the clumn definition and then click the "Save Change Script" to preview the script which EM is about to execute; cut + paste this into QA (I never trust the mmc interference factor with EM), and run the script from there.

    EM actually produces a script that was similar to the method described in a previous post; instead of adding a new column then drop / rename; the script generates a tmp_... table copies all the data there and then drops the original - is there a more efficient way?...

    a few figures;

    Column type modifiaction took 1hr 5min to complete (Server is Dual Xeon 2.4GHz, 2Gb RAM, running SQL 2k Sp3a + IIS) and resulted in a log file of 6Gb. (DB itself was only 7Gb at this point).

    Data file size swelled from 7Gb to 10.5Gb.

    Once process was complete the db and log were backed up and DBCC SHRINKFILE used to return the files to respectable sizes.

    Replication snapshot created and synchronisation started (Log shipping was dealt with the following morning!).

    All in all a worthwhile exercise - even without approaching the question of how now to index my varchar(8000) column in a useful way the tim taken to query the varchar column is considerably less then it's text equivalent (varchar query takes < 1 minute; text was taking 7 min+ for the same query)....

    ...in retrospect;

    would it have been better to change the recovery model to simple before changing the type?.

    Sorry if all this is old news to everyone else; it's just I see so many threads that end abruptly without conclusion - so annoying!

    😉

     

     

     

  • Switching to simple recovery may have made the process run faster, probably with less log file growth.  You had already stopped log shipping and replication, and planned a full backup afterwards.  One thing you may have overlooked is that the log backup maintenance plan will fail unless you disable it or remove the database from the plan while it is in simple recovery mode.

  • >>EM actually produces a script that was similar to the method described in a previous post; instead of adding a new column then drop / rename; the script generates a tmp_... table copies all the data there and then drops the original - is there a more efficient way?...

    Yes, add new column, update, and drop old would have been much more efficient; however, it doesn't give you any control over where the column is placed (ordinal position).

     

    >> even without approaching the question of how now to index my varchar(8000) column in a useful way

    An index key has a maximum length of 900 bytes. You can't index this column.


    My blog: SQL Soldier[/url]
    SQL Server Best Practices:
    SQL Server Best Practices
    Twitter: @SQLSoldier
    My book: Pro SQL Server 2008 Mirroring[/url]
    Microsoft Certified Master: SQL Server, Data Platform MVP
    Database Engineer at BlueMountain Capital Management[/url]

  • Thanks Robert,

    I've created another table on the reporting server which is populated with the ID column and the first 20 characters of the varchar(8000) column -I'll run a scheduled transfer from the main table to this and create an index on the varchar(20) column....now reports in an instant!!!

    Many thanks to all.

  • I agree that some threads vanish just where you want to know "Yes - and then what...?".  At the risk of sounding like a Project Manager, this one has been a great example of helpful contributions, a well-planned implementation, a successful outcome, and even Lessons Learned.  The SQLServerCentral community at its best.

  • Couldn't agree more...

    One more lesson learnt though - I chose the size of the varchar column arbitrarily at 8000 - whereas this actually tips the potential row size over the limit at 8071 bytes! - Whereas I should have taken into consideration the other handful of columns in the table and limited this column size appropriately.

    Just waiting for the day where someone wants to write their dissertation in customerhistory!

    😉

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply