Computed column storage size

  • We have an ID that is 8 characters long with leading zeros if the ID is less than 8 characters long. Currently, it is stored as a VARCHAR(8), with the developer having to specifically add the leading zeros during insertion. I would like to change that MemberId field to an INT field, since letters are never used for this field, only numbers. The only issue is we still need to display the leading zeros for reports, websites, applications, etc.

    My solution was to use a computed column on the INT field, using for the presentation the function FORMAT(MemberId, '00000000'). While this does work, I noticed an anomaly with the computed column's data type; it returns an NVARCHAR(4000), which is the max size for NVARCHAR.

    I also added another computed column, this time using REPLICATE and LEN to get the desired result. This returned a data type of VARCHAR(8000), which is the max size for VARCHAR.

    Here's the code I used to set up this quick example:

    CREATE TABLE test.MemberTrial

    (

    MemberId INT,

    ReplicatedMemberId AS REPLICATE('0', 8 - LEN(MemberID)) + CAST(MemberID AS VARCHAR(8)),

    FormattedMemberId AS FORMAT(MemberId, '00000000'),

    CastMemberId AS CAST(FORMAT(MemberId, '00000000') AS CHAR(8))

    )

    INSERT INTO test.MemberTrial (MemberId)

    VALUES

    (123),

    (1123),

    (11235),

    (112358),

    (11235813),

    (112358132),

    (9999999),

    (321321),

    (3),

    (73)

    SELECT

    MemberId,

    ReplicatedMemberId,

    FormattedMemberId,

    CastMemberId

    FROM

    test.MemberTrial

    My questions are:

    • Since they are non-persisted computed columns, does it matter if FORMAT() or REPLICATE() returns a max NVARCHAR/VARCHAR data type?
    • Is there a way to force FORMAT() to a smaller data type without casting?
    • If I wrap FORMAT() in a CAST(), will that cause a little performance hit when selecting the values?
  • A computed column is not stored, therefor there is no need to be concerned about the size. HOWEVER, SQL has to recompute the value every time that it is read.

    If you make the computed column PERSISTED, then it is persisted to disk. You also have the added benefit of being able to index the value, if necessary/desired.

    Both of these options will result in a column of type varchar(8)

    ReplicatedMemberId AS RIGHT(REPLICATE('0', 8) + CAST(MemberID AS VARCHAR(8)), 8) PERSISTED,

    ReplicatedMemberId2 AS RIGHT('00000000' + CAST(MemberID AS VARCHAR(8)), 8) PERSISTED,

  • Exactly what DesNorton mentioned. I just wanted to add something about the performance.

    The FORMAT function has been proven to be very slow. It's up to 40 times slower that previous formatting methods. What seems to be the most effective option is to add the zeros before casting.

    CREATE TABLE MemberTrial

    (

    MemberId INT,

    RightMemberId AS RIGHT(MemberId + 100000000, 8)

    );

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • DesNorton (10/19/2016)


    A computed column is not stored, therefor there is no need to be concerned about the size. HOWEVER, SQL has to recompute the value every time that it is read.

    If you make the computed column PERSISTED, then it is persisted to disk. You also have the added benefit of being able to index the value, if necessary/desired.

    Both of these options will result in a column of type varchar(8)

    ReplicatedMemberId AS RIGHT(REPLICATE('0', 8) + CAST(MemberID AS VARCHAR(8)), 8) PERSISTED,

    ReplicatedMemberId2 AS RIGHT('00000000' + CAST(MemberID AS VARCHAR(8)), 8) PERSISTED,

    I realize that if you persist them they will go to disk. But wouldn't that also increase storage? If you're already storing the INT value (4 bytes), then persist the VARCHAR(8) value (10 bytes, correct? Eight for the characters, 2 for overhead?), you're basically storing the value twice, taking up 14 bytes as opposed to 4. Is that correct?

    And wouldn't it be best just to store and index the original MemberId INT column and use the computed column for display only? Then you're indexing an INT instead of a VARCHAR. Isn't that better for speed and performance, too?

  • CferMN (10/19/2016)


    DesNorton (10/19/2016)


    A computed column is not stored, therefor there is no need to be concerned about the size. HOWEVER, SQL has to recompute the value every time that it is read.

    If you make the computed column PERSISTED, then it is persisted to disk. You also have the added benefit of being able to index the value, if necessary/desired.

    Both of these options will result in a column of type varchar(8)

    ReplicatedMemberId AS RIGHT(REPLICATE('0', 8) + CAST(MemberID AS VARCHAR(8)), 8) PERSISTED,

    ReplicatedMemberId2 AS RIGHT('00000000' + CAST(MemberID AS VARCHAR(8)), 8) PERSISTED,

    I realize that if you persist them they will go to disk. But wouldn't that also increase storage? If you're already storing the INT value (4 bytes), then persist the VARCHAR(8) value (10 bytes, correct? Eight for the characters, 2 for overhead?), you're basically storing the value twice, taking up 14 bytes as opposed to 4. Is that correct?

    And wouldn't it be best just to store and index the original MemberId INT column and use the computed column for display only? Then you're indexing an INT instead of a VARCHAR. Isn't that better for speed and performance, too?

    If you ever need to search for MemberId like '000012%' then you will have to do a full table scan for each search. In this case, you are better off adding the persisted column and indexing it.

    If that never happens, then there are 2 options

    1 - Store the extra 10Bytes, and use drive space - With a max of 10mil rows, that transaltes to a max 100MB (or 95.3 if 1KB = 1024B).

    2 - Take the CPU perf hit of having to do the calc on every read.

    You need to make the call of which is the lesser of 2 evils for your situation.

  • Persisted computed columns only make sense when the computation is expensive or when an index is needed. If that's not the case, persisting the computed column makes little sense.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply