Hash Partitioning

  • The "original challenge" is not accessible. May be temporarily.

    But anyway, both articles focus on splitting the load on the SINGLE index (heap in "Boosting INSERT" case).

    In "Resolving PAGELATCH Contention" it's just replacing single sequence of numbers with 4 ones populated in parallel.

    Actually, PAGELATCH does not go anywhere, the approach just eases the contention by distributing the load.

    Such a RAID-0 on a table.

    Suprisingly, it gave only 15% gain in performance, I'd expect much more.

    And again, I'm pretty sure, adding another index (or two) on the table will throw that gain out of the window.

    _____________
    Code for TallyGenerator

  • Sergiy (5/21/2012)


    The "original challenge" is not accessible. May be temporarily.

    But anyway, both articles focus on splitting the load on the SINGLE index (heap in "Boosting INSERT" case). In "Resolving PAGELATCH Contention" it's just replacing single sequence of numbers with 4 ones populated in parallel.

    The link works fine now. The articles focus on reducing latch contention, yes. The 'boosting insert' case does not use a heap, as previously mentioned.

    Actually, PAGELATCH does not go anywhere, the approach just eases the contention by distributing the load.

    Such a RAID-0 on a table.

    The contention is not physical disk contention (such as might be relieved by adding extra I/O capacity). The contention is on in-memory synchronization structures known as latches. The articles make this distinction quite clear; modified memory pages will have to be flushed to disk by checkpoint or lazy writer, but this is a separate and asynchronous operation.

    Surprisingly, it gave only 15% gain in performance, I'd expect much more. And again, I'm pretty sure, adding another index (or two) on the table will throw that gain out of the window.

    The 15% improvement due to reduced latch contention was in the SQLCat case for a particular customer on particular hardware, without additional tuning. The improvement in Paul Randal's challenge was very much larger, as the graphic shows:

  • SQL Kiwi (5/22/2012)


    The link works fine now.

    Yep. Can read it now.

    The 'boosting insert' case does not use a heap, as previously mentioned.

    If we choose to believe so.

    IndId = 256 indicates that it's a data page, but I cannot see any indicator that it is a leaf page of clustered index.

    May be I miss something.

    The contention is not physical disk contention (such as might be relieved by adding extra I/O capacity). The contention is on in-memory synchronization structures known as latches. The articles make this distinction quite clear; modified memory pages will have to be flushed to disk by checkpoint or lazy writer, but this is a separate and asynchronous operation.

    I did not say anything about physical operations.

    I just pointed on the similarity of the approaches used here and in RAIDs.

    The 15% improvement due to reduced latch contention was in the SQLCat case for a particular customer on particular hardware, without additional tuning. The improvement in Paul Randal's challenge was very much larger, as the graphic shows:

    Actually, I should not have been surprised by the small improvement rate.

    It was not that bad, in fact.

    Because it was achieved on the real system, not during the experiment build to give NEWID() approach the biggest advantage.

    The problem with using NEWID() as a clustered PK shows not when the key is generated, not when a huge bulk of data is being inserted, but when it forces insertions in between of existing records.

    A real life scenario is missing in both examples.

    After a year of regular million rows inserts into the table it's gonna hold billions of rows.

    Which will be arranged in the order of random values in the clustered index.

    If maintained well - with high page density.

    Then every of the million new records being inserted will split the existing pages, as practically none of them have a chance to end up and one of the edge pages of the clustered index.

    Probable existance of other indexes on the table (real life scenario) will add to the load, as those indexes need to be rewritten as well to update pointers to PK.

    Number of pages which need to be rewritten in combination with bad performance of SELECT queries from highly fragmented table (locking clustered index pages and preventing them from being updated by a next INSERT) will surely make PAGELATCH issue look insignificant.

    This comment is not relevant to partitioning approach ("Resolving PAGELATCH Contention"), as it still uses the same sequential identity, but distributes it accross 4 separate pages, as RAID0 does with data files being recorded on its disks.

    _____________
    Code for TallyGenerator

  • Sergiy (5/22/2012)


    If we choose to believe so. IndId = 256 indicates that it's a data page, but I cannot see any indicator that it is a leaf page of clustered index. May be I miss something.

    The thing you are missing is circled in green on that diagram. Also see m_level. It's obviously not a heap!

    http://www.sqlskills.com/blogs/paul/post/Inside-the-Storage-Engine-Anatomy-of-a-page.aspx

    Probable existance of other indexes on the table (real life scenario) will add to the load, as those indexes need to be rewritten as well to update pointers to PK.

    Clustering keys don't change when a page splits, so no non-clustered index maintenance is required.

    This is counter-intuitive stuff, in a specific scenario, so don't worry too much if you don't understand or agree with it, based on your own experiences.

  • SQL Kiwi (5/23/2012)


    Sergiy (5/22/2012)


    If we choose to believe so. IndId = 256 indicates that it's a data page, but I cannot see any indicator that it is a leaf page of clustered index. May be I miss something.

    The thing you are missing is circled in green on that diagram. Also see m_level. It's obviously not a heap!

    http://www.sqlskills.com/blogs/paul/post/Inside-the-Storage-Engine-Anatomy-of-a-page.aspx

    Probable existance of other indexes on the table (real life scenario) will add to the load, as those indexes need to be rewritten as well to update pointers to PK.

    Clustering keys don't change when a page splits, so no non-clustered index maintenance is required.

    This is counter-intuitive stuff, in a specific scenario, so don't worry too much if you don't understand or agree with it, based on your own experiences.

    OK, it's been a while.

    But since the script was saved, why don't I run it and share the results?

    Here is what I did.

    1. Create 2 almost identical tables.

    Only difference is in the definition for the PK column: Random (NEWID() ) vs. Sequential (IDENTITY(1,1) )

    IF OBJECT_ID('MyBigTable_R') IS NOT NULL

    DROP TABLE MyBigTable_R

    IF OBJECT_ID('MyBigTable_S') IS NOT NULL

    DROP TABLE MyBigTable_S

    SET NOCOUNT ON

    SET ANSI_WARNINGS OFF

    CREATE TABLE MyBigTable_R (

    c1 UNIQUEIDENTIFIER ROWGUIDCOL DEFAULT NEWID ()

    ,c2 DATETIME DEFAULT GETDATE ()

    ,c3 CHAR (111) DEFAULT 'a'

    ,c4 INT DEFAULT 1

    ,c5 INT DEFAULT 2

    ,c6 BIGINT DEFAULT 42);

    CREATE CLUSTERED INDEX MyBigTable_R_cl ON MyBigTable_R (c1);

    CREATE INDEX MyBigTable_R_c2 ON MyBigTable_R (c2);

    CREATE INDEX MyBigTable_R_c3 ON MyBigTable_R (c3);

    CREATE INDEX MyBigTable_R_c4 ON MyBigTable_R (c4);

    CREATE TABLE MyBigTable_S (

    c1 bigint IDENTITY(1,1) NOT NULL

    ,c2 datetime DEFAULT GETDATE ()

    ,c3 char (111) DEFAULT 'a'

    ,c4 int DEFAULT 1

    ,c5 INT DEFAULT 2

    ,c6 BIGINT DEFAULT 42);

    CREATE CLUSTERED INDEX MyBigTable_S_cl ON MyBigTable_S (c1);

    CREATE INDEX MyBigTable_S_c2 ON MyBigTable_S (c2);

    CREATE INDEX MyBigTable_S_c3 ON MyBigTable_S (c3);

    CREATE INDEX MyBigTable_S_c4 ON MyBigTable_S (c4);

    TRUNCATE TABLE MyBigTable_R

    TRUNCATE TABLE MyBigTable_S

    2. Populate both tables with identical sets of data.

    Again, PK columns were populated with different data, according to their definitions:

    INSERT INTO MyBigTable_R (c3, c4)

    SELECT TOP 200000

    LEFT('abcdefghijk', (CONVERT(bigint, c1.id) + CONVERT(bigint, c2.id+c2.colid))%10), (CONVERT(bigint, c1.id) + CONVERT(bigint, c2.id+c2.colid))%10

    FROM syscolumns c1, syscolumns c2--, syscolumns c3--, syscolumns c4, syscolumns c5

    ORDER BY (CONVERT(bigint, c1.id) + CONVERT(bigint, c2.id+c2.colid))%10, c1.id, c2.id, c2.colid--, c3.id--, c4.id, c5.id

    --ORDER BY c1.id, c2.id, c2.colid--, c3.id--, c4.id, c5.id

    INSERT INTO MyBigTable_S (c3, c4)

    SELECT TOP 200000

    LEFT('abcdefghijk', (CONVERT(bigint, c1.id) + CONVERT(bigint, c2.id+c2.colid))%10), (CONVERT(bigint, c1.id) + CONVERT(bigint, c2.id+c2.colid))%10

    FROM syscolumns c1, syscolumns c2--, syscolumns c3--, syscolumns c4, syscolumns c5

    ORDER BY (CONVERT(bigint, c1.id) + CONVERT(bigint, c2.id+c2.colid))%10, c1.id, c2.id, c2.colid--, c3.id--, c4.id, c5.id

    --ORDER BY c1.id, c2.id, c2.colid--, c3.id--, c4.id, c5.id

    3. Check the fragmentation level for all the idexes on the tables:

    DBCC SHOWCONTIG ('MyBigTable_R') WITH all_indexes , TABLERESULTS

    DBCC SHOWCONTIG ('MyBigTable_S') WITH all_indexes , TABLERESULTS

    4. Add small subsets (1/20 of the initial one) to both tables.

    No new values are added to columns c3 and c4, new records contain the same values as already existing rows.

    c2 on another hand is populated with the current time, so new values are added to the index on this column, and those new values are outside the existing data range.

    And after that check the index fragmentation once again:

    INSERT INTO MyBigTable_R (c3, c4)

    SELECT TOP 10000

    LEFT('abcdefghijk', (c1.id + c2.id+c2.colid)%10), (c1.id + (c2.id+c2.colid))%10

    FROM syscolumns c1, syscolumns c2--, syscolumns c3--, syscolumns c4, syscolumns c5

    ORDER BY (CONVERT(bigint, c1.id) + CONVERT(bigint, c2.id+c2.colid))%10, c1.id, c2.id, c2.colid--, c3.id--, c4.id, c5.id

    INSERT INTO MyBigTable_S (c3, c4)

    SELECT TOP 10000

    LEFT('abcdefghijk', (c1.id + c2.id+c2.colid)%10), (c1.id + (c2.id+c2.colid))%10

    FROM syscolumns c1, syscolumns c2--, syscolumns c3--, syscolumns c4, syscolumns c5

    ORDER BY (CONVERT(bigint, c1.id) + CONVERT(bigint, c2.id+c2.colid))%10, c1.id, c2.id, c2.colid--, c3.id--, c4.id, c5.id

    DBCC SHOWCONTIG ('MyBigTable_R') WITH all_indexes , TABLERESULTS

    DBCC SHOWCONTIG ('MyBigTable_S') WITH all_indexes , TABLERESULTS

    5. Repeat the step 4 adding another small set to the tables:

    INSERT INTO MyBigTable_R (c3, c4)

    SELECT TOP 1000

    LEFT('abcdefghijk', (c1.id + c2.id+c2.colid)%10), (c1.id + (c2.id+c2.colid))%10

    FROM syscolumns c1, syscolumns c2--, syscolumns c3--, syscolumns c4, syscolumns c5

    ORDER BY (CONVERT(bigint, c1.id) + CONVERT(bigint, c2.id+c2.colid))%10, c1.id, c2.id, c2.colid--, c3.id--, c4.id, c5.id

    INSERT INTO MyBigTable_S (c3, c4)

    SELECT TOP 1000

    LEFT('abcdefghijk', (c1.id + c2.id+c2.colid)%10), (c1.id + (c2.id+c2.colid))%10

    FROM syscolumns c1, syscolumns c2--, syscolumns c3--, syscolumns c4, syscolumns c5

    ORDER BY (CONVERT(bigint, c1.id) + CONVERT(bigint, c2.id+c2.colid))%10, c1.id, c2.id, c2.colid--, c3.id--, c4.id, c5.id

    DBCC SHOWCONTIG ('MyBigTable_R') WITH all_indexes , TABLERESULTS

    DBCC SHOWCONTIG ('MyBigTable_S') WITH all_indexes , TABLERESULTS

    You may find the outcome in the files attached (both have the same data, just chose the format you prefer).

    What's in there?

    After initial uptake there is no much difference: both tables have Scan Density on above 95% for all indexes, logical fragmentation is near zero.

    Extent Switches are less than number of extents by 1.

    Perfect.

    But after the following small set insert the stats become quite different.

    "Sequential" table remains on "90th" level of scan density and <5 on logical fragmentation, almost no change to the initial state. Number of Extent Switches is about the same as number of Extents.

    But "Random" table stats look quite different:

    - PK is on 13% os Scan Density and 94% of Logical Fragmentation, number of extent switches is about 8 times of number of extents (as expected, and it's not what we are discussing here);

    - c2 fragmentation remains at the same level: 97% and 1.15%, Extent Switches perfectly below of Extents by 1;

    - c3 and c4 suddenly have about 38% Scan Density and about 25% of Logical Fragmentation, number of Extent Switches is about 3 times of number of Extents.

    Second small upload does not change the numbers significantly.

    Pretty much the same fragmentation stats.

    Why would be that?

    There are no values in c3 and c4. The only source of the fragmentation can be the clustering keys.

    Apparently, existing pages on the leaf levels don't have space to accomodate new PK references, so SQL Server has to add new ones.

    For "sequential" table new PK references are added at the end of the existing range, so they cause only minor fragmentation, most of existing leaf pages remain intact indeed.

    But for "random" table new PK references force splitting of existing leaf level pages, as the added in random positions.

    Index on c2 does not suffer from the splitting.

    Why?

    Well, I may go intuitive here :-), but apparently because the new data add new values beyond the current range it create new leaf pages and puts PK references to those new pages, so existing pages don't get any new data and therefore do not get fragmented.

    I'd say it's not an ordinary case, normally non-clustered indexes follow rather patterns of indexes on columns c3 and c4 than on c2.

    I believe, all of this above perfectly proves my point:

    Splitting pages of a clustered index does change fragmentation of non-clustered index pages.

    And additional non-clustered index maintenance is required because of PK fragmentation.

    It might not be mentioned in the latest MS books, but it's still a fact of life.

    That's where own brain and some common sense still become useful.

    😎

    _____________
    Code for TallyGenerator

  • The nature of the clustering key will not affect fragmentation in a non-clustered index unless it is part of the non-clustered index key itself.

    In tour example c2 never becomes fragmented because it defaults to getdate() meaning its index key is ever-increasing.

    c3 and c4 become fragmented, not because of the clustering key, but because you're applying modulo 10 to your calculation meaning in the subset inserts you'll be inserting some of the same values previously inserted and non-clustered index page splits occur. I think your example proves fragmentarion can occur in a non-clustered index but not for the reasons outlined, i.e. not having anything to do with the clustered index definition.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • opc.three (8/6/2012)

    c3 and c4 become fragmented, not because of the clustering key, but because you're applying modulo 10 to your calculation meaning in the subset inserts you'll be inserting some of the same values previously inserted and non-clustered index page splits occur.

    Same modulo is applied for "Sequential" table inserts.

    Split does not occur there.

    _____________
    Code for TallyGenerator

  • I see what you're pointing out but I think the example is a bit contrived. The results seem to be skewed due to low cardinality of c3 and c4 per your test data. The big question is: does SQL Server look to sort same value index entries by the unique clustering key (whether by declaration or by uniqueifier)?

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

Viewing 8 posts - 16 through 22 (of 22 total)

You must be logged in to reply to this topic. Login to reply