Overhead when using datatype varchar ?

  • I have been browsing Books Online without finding the answer yet, so hoping someone can help or identify where I will find my answer.

    How much storage overhead is there for a varchar column?  I am guessing that SQL Server needs to at least store a length attribute somewhere that would take up some storage space.  If I should have been able to find this in BOL somewhere, I would appreciate the tip on what area I missed looking for this.  Thanks!

  • I think varchar has a 1 to 2 byte overhead for storing its variable length information.  There is also a very small computational increase in processing varchars as opposed to chars.

  • Took a while to find it again, but in BOL, under contents, look under:

    Creating and Maintaining Databases

      Databases

        Database Design Considerations

          Estimating the Size of a Database

    ...that and the three entries beneath it are the keys to the castle.

       Philip

     

  • Philip,

    Many thanks for the BOL directions.  I do see the addition of 2 in the formula listed with #4 of Estimating the size of a Table.  It is not explained there, but that is the type of information I was seeking.  Appreciate your help!

    Mike

  • Unlike other RDBS, SQL handle row structure and storage for you. It looks at the datatype makeup that you've requested and then physically stores the rows in an efficient manner. Specifically, all fixed character data is stored in the row first, because it knows how big it is. Then it puts a small buffer of 4 bytes (I believe) for "null pointer blocks", and pointers to the end of the varchar data. Then it puts the varchar data at the end. The pointers to the end of the varchar blocks allow it to avoid allocating space for unused bytes of the varchar, which is why we choose the datatype in the first place. So here's what the row looks like:

    |-2 bytes -|- 2 bytes -|- Fixed width data -| -2 bytes null pointer -|-2 bytes varchar pointer -|- varchar data -|

    So the varchar data is ALWAYS stored at the end of the row - regardless of how you order the rows in the table. Since you always get the null pointers, the overhead for varchar data is 2 bytes.

    I'd say it is a fair trade off for the versatility.

    Cheers.

  • mlipham (sorry do not see your name),

    Thanks for the additional information.  Although from what I looked at, there is also another 2 bytes per varchar column, probably for the individual column length. 

    The client I am at now wants to use varchar for everything because they claim it makes their ASP coding much easier?  They even have varchar columns defined with a length of 1, which makes no sense to me.  Thus my research in BOL and here at the forums.  Appreciate you sharing!

    Mike

  • I agree, the varchar(1) is useless, because it has provisions to handle nulls. Besides it doesn't sound like they are too concerned about the SQL server side of things.

    If you can stop by a Borders or something. There is a book by Ken England about the performance of SQL server (maroonish cover ~ 200 ppg). In there he talks about the row structures. If I remember correctly (My copy is loaned out) it uses a bitmap (2 bytes) only to track all varchar columns, which may be one reason shy there is a 1,024 limit on the number of columns.

    I'm not convinced that it would be any easier for ASP. It sounds liek they are sacrificing something for the sake of convenience. In general, the trade off shouldn't be noticeable unless we're talking about terabytes of data, which I suspect we aren't.

    Cheers.

  • I will make a note to look for that book, if you remember the title send me a message with it, thanks!

    Just to clarify, my client wants to use varchar because they said through experimentation, they discovered that their asp code was too slow when they had to add trim functions to database access (select?) statements.  

    (More details...)  Their forms/pages/(whatever they are called in asp) would be set up for the max size column.  The comparison of what was on the form to what is in the database would not work if the db column was char they said because of padding in the database.  When they changed the column datatype to varchar, they did not need to have a function in their code to trim, and their performance was improved.

    I am just trying to verify the details to share with them the comparison of using different methods to accomplish what they want to do.  They are a small business, so their volumes are low enough that they can afford to waste some space, I was just trying to understand how much waste really was happening.

    Thanks for the information.

  • I generally opt for the varchar approach as I find it makes the code more reliable.  When using char you always have to be conscious of the columns lengths as ‘ABC’ will not equal ‘ABC  ‘ (char(5)) and one such error can lurk around unnoticed in the code for years.

     

    Except for very unusual situations I don’t think the very minor performance hit is of any concern and as far a disk space goes, the overall use of varchar probably saves more space than it may waste in small length columns.

     

    Bottom line, I always go for reliability over speed and space issues.

     

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply