Mitigatng Circumstances for a Heap ?

  • Under what circumstances could you justify/allow a table to be a heap ? a temporary dumping ground that was written to far more than it was read from for example ? Or is there no reason at all why a table shouldn't have at least one index ?

    thanks ~simon

  • IMHO every physical table should have a clustered index. It is the absence of a clustered index that makes a table a heap. If there is no suitable (i.e., unique, seldom changing, small size) column then I add an Integer or Big Integer column with the Identity property enabled and use that column as the clustered index.

    Note, the clustered index does not have to be the primary key but also the Natural Primary key does not have to be used as the actual Primary Key of the table. You can specify that the added column is the actual primary key and then add a Unique constraint for the Natural Key. This is useful when the natural key is an unwieldy combination of columns, may change frequently, is not naturally incrementing, etc.

    Specifying a Unique index on the Natural Key means you can still define Foreign Keys that reference the column(s) and you can join on them as well. Using the Int/BigInt as the Primary Key and Clustered Index saves space in other indexes, saves space in the base table (it is used more efficiently), and probably improves performance on inserts.

    YMMV 🙂

  • Hi Simon

    It entirely depends upon the table purpose 😀

  • hiyah

    so what purpose would you say yes its acceptable to be a heap ? 🙂

  • Simon_L (11/5/2008)


    a temporary dumping ground that was written to far more than it was read from for example ?

    Something that's written to frequently is a bad candidate for a heap because of the forwarding pointers that a heap can have. forwarding pointer occurs when a row is updated and doesn't fit on the page any longer. The row is moved and a pointer is left in its place showing where the row has moved to.

    The nice thing is that it means the nonclustered indexes don't have to be updated when the row moves, the downside is that it means to read that row requires 2 random IOs, which, on a large table with lots of forwarding pointers is horrendously expensive in terms of IO.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Simon_L (11/5/2008)


    Under what circumstances could you justify/allow a table to be a heap ? a temporary dumping ground that was written to far more than it was read from for example ? Or is there no reason at all why a table shouldn't have at least one index ?

    thanks ~simon

    I'm a great believer that most tables must have a PK and a clustered index on something temporal... but the real answer is "It Depends". If it's just a staging table for imported data, and depending on how you're using it, it's sometimes better to have it with no clustered index or a PK...

    I also have HUGE "nearly" online archive tables... all are heaps... none have indexes... they are there just to have the data available for emergencies and getting the data normally isn't an "instant" requirement of archived data.

    Ok... why do I do that? Like I said, these tables are bloody huge... keeping indexes on them for every query that someone may want to use someday (maybe, once a year) is just too expensive disk-size-wise. I saved nearly 100GB just by dropping the original indexes from the archive tables.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Archive tables with little or no frequent access afterwards are probably the only time I'd avoid an index. Should I have to retrieve data from it in some manner, I could create a temporary clustered index based on the query I'm going to run, do my retrieval, and drop the index immediately (don't forget shrinkdatabase if space is getting tight).

    Also, especially with archive tables that developers have forgetten about but everyone is scared to drop, even though space is getting tight, you may have to forego any index to reduce space usage.

    Gaby
    ________________________________________________________________
    "In theory, theory and practice are the same. In practice, they are not."
    - Albert Einstein

  • Gaby A. (11/7/2008)


    Archive tables with little or no frequent access afterwards are probably the only time I'd avoid an index. Should I have to retrieve data from it in some manner, I could create a temporary clustered index based on the query I'm going to run, do my retrieval, and drop the index immediately (don't forget shrinkdatabase if space is getting tight).

    Also, especially with archive tables that developers have forgetten about but everyone is scared to drop, even though space is getting tight, you may have to forego any index to reduce space usage.

    even then a clustered index doesn't hurt because it doesn't take up extra space.

    ~BOT

  • SQLBOT (11/7/2008)


    even then a clustered index doesn't hurt because it doesn't take up extra space.

    ~BOT

    Any bet's on that? 😉

    USE TempDB

    GO

    --===== Create and populate a 1,000,000 row test table.

    -- Column "RowNum" has a range of 1 to 1,000,000 unique numbers

    -- Column "SomeInt" has a range of 1 to 50,000 non-unique numbers

    -- Column "SomeLetters2" has a range of "AA" to "ZZ" non-unique 2 character strings

    -- Column "SomeMoney has a range of 0.0000 to 99.9999 non-unique numbers

    -- Column "SomeDate" has a range of >=01/01/2000 and <01/01/2010 non-unique date/times

    -- Column "SomeCSV" contains 'Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10'

    -- for all rows.

    -- Column "SomeHex12" contains 12 random hex characters (ie, 0-9,A-F)

    -- Jeff Moden

    SELECT TOP 1000000

    RowNum = IDENTITY(INT,1,1),

    SomeInt = ABS(CHECKSUM(NEWID()))%50000+1,

    SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID()))%26+65)

    + CHAR(ABS(CHECKSUM(NEWID()))%26+65),

    SomeCSV = CAST('Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' AS VARCHAR(80)),

    SomeMoney = CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY),

    SomeDate = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME),

    SomeHex12 = RIGHT(NEWID(),12)

    INTO dbo.JBMTest

    FROM Master.dbo.SysColumns t1

    CROSS JOIN Master.dbo.SysColumns t2

    --===== Measure the table size before adding a clustered index

    DBCC UPDATEUSAGE (0,'dbo.JBMTest')

    EXEC dbo.sp_SpaceUsed 'dbo.JBMTest'

    --===== Measure the table size after adding a clustered index

    CREATE CLUSTERED INDEX IXC_JBMTest_RowNum ON dbo.JBMTest (RowNum)

    DBCC UPDATEUSAGE (0,'dbo.JBMTest')

    EXEC dbo.sp_SpaceUsed 'dbo.JBMTest'

    DROP TABLE dbo.JBMTest

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Ok, Jeff proved that the table is larger with an index than without, but why would that be true of a clustered index? Doesn't a clustered index primarily order the data already existing by the key and not create new data to be stored?

    And to answer the original question, I often leave small tables in a heap that are created as temporary storage places for imported data. That data then gets beaten into shape and validated before being put into the (indexed and partially normalized) primary tables. Then the import tables are dropped.

    ---
    Timothy A Wiseman
    SQL Blog: http://timothyawiseman.wordpress.com/

  • Because, contrary to popular belief, Clustered Indexes are NOT totally contained in the data at the Leaf level... there is an Intermediate Level containing index rows AND a Root Level, also containing index rows. The extra bytes come from them. Lookup "Clustered Indexes, Architecture" in Books Online to see more.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • That makes complete sense. Thanks.

    ---
    Timothy A Wiseman
    SQL Blog: http://timothyawiseman.wordpress.com/

  • You bet... thanks for the feedback, Timothy.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • LOL,

    yes, yes... touche

    In those cases when I'm strapped for extra 208k of disk, I usually drop the clustered index.

    ~BOT

  • http://blogs.sqlserver.org.au/blogs/greg_linwood/archive/2006/09/25/387.aspx

    The first and second articles offer the bulk of the argument. Good reading regardless of whether you agree.

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply