Optimize memory with varchar(max) datatype

  • I am building a table change log that will track each attribute update and include the original and new values.

    [BatchYearMonthKey] [int] NULL,

    [BatchYearMonthDayKey] [int] NULL,

    [AccountID] [varchar](200) NULL,

    [Attribute] [varchar] (200) NULL,

    [Old_ValueAtrDefault] [varchar] (200) NULL,

    [New_ValueAtrDefault] [varchar] (200) NULL,

    [Old_ValueAtrLong] [varchar] (max) NULL,

    [New_ValueAtrLong] [varchar] (max) NULL

    The challenge that the spectrum of varchar lengths across the table. I have one attribute that requires varchar(max) and all other attributes (about 40) are varchar (200).

    I am trying to accomplish the following:

    Account ID Status

    1 Enabled

    Now changed to

    AccountID Status

    1 Disabled

    My log table will look like the following:

    [BatchYearMonthKey] BatchYearMonthDayKey] [AccountID] [Attribute] [Old_ValueAtrDefault] [New_ValueAtrDefault] [Old_ValueAtrLong] [New_ValueAtrLong]

    201504 20150409 1 Status Enabled Disabled NULL NULL

    My question:

    I created two fields (Old_ValueAtrLong and New_ValueAtrLong) dedicated for the one attribute that is a varchar (max). I was trying to avoid storing [Status] for example that's a varchar(200) in a field that is varchar(max). Is this the right approach? Or are there other recommendations in how to handle storing the data in the most efficient manner?

  • Here is the challenge I see from your design. You are creating a EAV style of logging table. This is a real pain to deal with in the long run. Consider how challenging it is going to be when management says "what did row X look on SomeDate?". You are going to end up running a separate query for each column in the table to determine what the value was for each column. Unless we are talking about millions of updates where most of them only update a single column I would suggest just auditing the whole row.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Hello Sean, are you suggesting I have just two columns; "OldValue" and "NewValue" that are varchar(max)?

  • Not unless most of your updates are only to a single column. I would make a copy of the table to hold the "old" values. If you update 7 out of 10 columns you aren't wasting much space per row just copying the whole thing anyway since doing the EAV style you would have 7 rows. It just doesn't gain much imho.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Ok got it. I elected for EAV for ease to identify what attributes are getting updated and consolidating into a single column vs. having to scan across the table attribute by attribute. Or maybe I'm looking at the analysis of the data incorrectly?

  • I strongly recommend NOT using VARCHAR of any type for Old/New values in such a table. Consider using SQL_VARIANT, instead. You don't really want to audit blobs anyway.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • J0shu@ (4/9/2015)


    Ok got it. I elected for EAV for ease to identify what attributes are getting updated and consolidating into a single column vs. having to scan across the table attribute by attribute. Or maybe I'm looking at the analysis of the data incorrectly?

    I am certainly not saying you are looking at the analysis incorrectly. I understand the appeal of using EAV for auditing, it makes it super simple for development. The challenge I have come across in the real world is the challenge it is to recreate what the entire table looked like on a given date when each row is split into x number of rows in an EAV audit table.

    For example. Let's say this table is holding customer opt in information with several different types of options. Management wants to know how many of type 1, type 2, and type 3 options for active subscriber on June 1, 2014. Using an EAV style of auditing this is painful. You have to first rebuild what every row looked like on that date. In order to rebuild the row you have to get the value for each column. It goes to the next level RBAR. It is more like CBAC (Column by agonizing column) * RBAR (row by agonizing row).

    A real world example of this that I used was in a health care system tracking health care electives. We did an EAV style auditing because of the amount of updates we decided not to track every single column. When we had to reproduce what the table looked like on a given day we spent a considerable amount of time figuring out how to get the data out like that. Once we did, and the query was horribly slow, we redesigned the auditing because using the audited data was just too painful.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Thank you so much Sean for the great example.

    A quick question \ concern (probably due to my lack of knowledge)

    If I opt for essentially going horizontal vs. vertical to track the changes. In your example when asked, "what was changed for customer x?"

    Did you have to scan column by column to see where the new updates were located? How did you manage monitoring all of the attributes across a single record?

  • J0shu@ (4/10/2015)


    Thank you so much Sean for the great example.

    A quick question \ concern (probably due to my lack of knowledge)

    If I opt for essentially going horizontal vs. vertical to track the changes. In your example when asked, "what was changed for customer x?"

    Did you have to scan column by column to see where the new updates were located? How did you manage monitoring all of the attributes across a single record?

    Here is how we dealt with that particular case. The audit table was basically a copy of the base table. A few different columns for auditing purposes but the primary columns were the same. We simply stored the existing values when an update was made. It takes more disc space this way but we had millions of rows in the table. The main table was around 10gb and the audit table we cleared out at the end of the year. We saved the data in a warehouse for archive purposes but at the end of the election year the data was no longer needed.

    The advantage we gained from this was all we had to do to find what the entire table looked like on a given date was to pull all the data from the base table and the audit table and get the most recent row prior to the date in question.

    By no means am I telling you that you need to do auditing this way. What I am saying is that in my experience the benefits of full auditing far outweighs the space savings from tracking only the changed values. It somewhat depends on the shape of the data and the usage and requirements of the historical information. I know some other people who use vertical auditing in an EAV and they work great for them.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Thank you so much Sean, this is very helpful. I also tested the implementation of CDC which the process looks to work quite well across all modifications to the table. Again I'm just concerned with identifying which attribute was changed for a particular primary key. Have you had any experience with CDC that you can comment on?

  • J0shu@ (4/10/2015)


    Thank you so much Sean, this is very helpful. I also tested the implementation of CDC which the process looks to work quite well across all modifications to the table. Again I'm just concerned with identifying which attribute was changed for a particular primary key. Have you had any experience with CDC that you can comment on?

    I have not used CDC in a real world scenario. I have used to play with and understand it a little bit. It does pretty much what I was saying about horizontal auditing. The one challenge is it doesn't capture "whodunnit", or at least it isn't easy to do. Maybe somebody has figured out to capture that piece of information but otherwise it seems pretty solid.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Ok great thank you Sean. Fortunately, "the who" is not a requirement nor will it be in the future so that gap is not in play. Thank you Sean for the assistance and insight.

  • You can also try for nvarchar(max) which will helps you to utilize the necessary storage by deponds as well as will increase the performance when compare to varchar(max) data type 🙂

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply