understanding reindexing

Question

Post reply

understanding reindexing

Robert Klimes

SSCoach

Points: 17865
More actions
August 26, 2008 at 8:35 am

#189753

this is just a question to help me understand what is going on in my reindexing process.

I have a table with 31mil rows. There are three indexes defined on this table 1)PK_Lead - on int column, and two non-clustered indexes 2) lead_A - datetime,varchar(15) , and 3) LEAD_B - varchar(10), datetime. When reindexing occurs on the non-clustered indexes i get very different performance between them. Lead_A takes less then a min to rebuild but Lead_B averages over 30 min. I dont understand why there is such a difference between the rebuild time when the date in the indexes is similar (the datetime field is the same coulmn). My only theory is that Lead_A is less fragmented because the datetime column is first that the value in that column is added sequentially (insert audit column). In Lead_B the varchar(10) column is first and this is a random alphanumeric value so there are more page splits thus more fragmentation. Or if not fragmentation of index is it just that since the datetime is sequential in the table it is much easier to create index then non-sequential values? Are there any other reasons that one reindex takes so much longer then another one?

I am running standard so i can not do the reindexing online. The indexes currently do not have a fill factor but I assume that if I added one to Lead_B this would just help in not having to reindex as often. Am I correct in this assumption?

Bob
-----------------------------------------------------------------------------
How to post to get the best help[/url]

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply

Chad Crawford SSChampion Points: 11610 More actions · Answer 1

Just a shot in the dark here, since no one has replied to you yet... Check out sys.dm_db_index_physical_stats using the DETAILED scanning mode and see if the fragmentation, or page count is different for the indexes. DETAILED will report on each level of the index (both 1 and 2 in your case) so you can compare them against each other.

Hope this helps!

Chad

Robert Klimes SSCoach Points: 17865 More actions · Answer 2

Thanks Chad. There appears to be more fragmentation, pages and records in the non-leaf levels of Lead_B index which is taking longer. Could this account for Lead_B index taking approx 30 times more time to rebuild?



29	1	NONCLUSTERED INDEX	IN_ROW_DATA	4	0	35.7540996327143	60134	2.57172315162803	154648	77.9904620706696	31708457

29	1	NONCLUSTERED INDEX	IN_ROW_DATA	4	1	99.7257769652651	586	1.86689419795222	1094	64.4332962688411	154648

29	1	NONCLUSTERED INDEX	IN_ROW_DATA	4	2	75	8	1	8	62.4073387694589	1094

29	1	NONCLUSTERED INDEX	IN_ROW_DATA	4	3	0	1	1	1	3.4717074376081	8

57	1	NONCLUSTERED INDEX	IN_ROW_DATA	4	0	0.339272638515987	6166	13.2890042166721	81940	99.7163825055597	31708467

57	1	NONCLUSTERED INDEX	IN_ROW_DATA	4	1	8.30769230769231	87	3.73563218390805	325	88.1952063256733	81940

57	1	NONCLUSTERED INDEX	IN_ROW_DATA	4	2	0	2	1	2	61.0946380034594	325

57 1 NONCLUSTERED INDEX IN_ROW_DATA 4 3 0 1 1 1 0.568322213985668 2

Bob
-----------------------------------------------------------------------------
How to post to get the best help[/url]

Chad Crawford SSChampion Points: 11610 More actions · Answer 3

So... the first index has two kinds of fragmentation. 1st - some of the pages and/or extents are out of order, 2nd- not all the pages are full of data. When you reindex, it shuffles the pages around to be in the right order and fills them to the fillfactor you specify (or the original fill factor). So with greater fragmentation there is more work to do, but it probably won't account for a 30x difference...

When you say that the datetime is sequential, do you mean that there is a clustered index on the column? If so, maybe it's actually using the clustered index to build the non-clustered index faster.

And now I'm really starting to dig... do the two varchar columns have different numbers of unique values? Are all the datetimes pretty much unique or are there duplicate datetime values?

Chad

Robert Klimes SSCoach Points: 17865 More actions · Answer 4

there is not a clustered index on the datetime column. it is just a time stamp, thus all distinct, of when the row was inserted into the table, the clustered index is an incrementing int identity, so i would assume that the Lead_A (which has datetime first) would not need to do many, if any, page splits as the order the values are inserted into the table is also the order of the index.

As for the two varchar values, in lead_A (quick one) the varchar value is our vendor names so a potential of only approx 100 distinct values. In Lead_D (slow one) the varchar is an identification values thus everyone is distinct.

Bob
-----------------------------------------------------------------------------
How to post to get the best help[/url]

Jeremy Brown SSCarpal Tunnel Points: 4223 More actions · Answer 5

Jeremy Brown

SSCarpal Tunnel

Points: 4223

August 27, 2008 at 2:46 pm

#863989

Are you doing a rebuild of the indexes or a re-org? Are autostats turned on or off for the database?

Robert Klimes SSCoach Points: 17865 More actions · Answer 6

Robert Klimes

SSCoach

Points: 17865

August 27, 2008 at 3:33 pm

#864017

I do both depending on the fragmentation. If its more then 30% I will do a rebuild. the reorganization is quite quick on both tables. auto stats is on.

Bob
-----------------------------------------------------------------------------
How to post to get the best help[/url]

bcronce SSCrazy Points: 2597 More actions · Answer 7

bcronce

SSCrazy

Points: 2597

August 28, 2008 at 10:37 am

#864565

kind of off topic but when you said

it is just a time stamp, thus all distinct

are time stamps guaranteed unique or 'just as long as you don't do 2+ successive inserts really fast'?

Robert Klimes SSCoach Points: 17865 More actions · Answer 8

Robert Klimes

SSCoach

Points: 17865

August 28, 2008 at 10:56 am

#864580

I never bothered to check the uniqueness of the timestamp, and it turns out that there are numerous (200000) values that are the same.

Bob
-----------------------------------------------------------------------------
How to post to get the best help[/url]

Chad Crawford SSChampion Points: 11610 More actions · Answer 9

I'm starting to shoot a little wildly here, but do a distinct count on the columns for both indexes. It sounds like the fast index is composed of two columns with lots of duplicates (so there would be fewer rows), while the slow index has almost unique values for the leading index column (so there would be many more rows). From the stats you posted, I would guess about a 2-to-1 ratio, but maybe there's something being hidden there and the index is just that much larger.

Chad

Robert Klimes SSCoach Points: 17865 More actions · Answer 10

Unfortunately there is not that much of a discrepancy between the counts of the leading columns on the indexes (Lead_A, datetime, 32million and Lead_B, varchar, 27 million). Now five million rows is quite a bit but in my opinion not enough to account for the reindex time difference. the second column in Lead_A has only 53 distinct values and the second column in Lead_B is the same column as the Leading column in Lead_A so 32 million distinct values.

I am sorry but I just realized (I didn't check it earlier) that Lead_A has the Primary key/clustered index column included in the index and there is no included column in Lead_B. Could this be the reason for the quick reindex?

Bob
-----------------------------------------------------------------------------
How to post to get the best help[/url]

Chad Crawford SSChampion Points: 11610 More actions · Answer 11

I don't think that would make too much of a difference. At the leaf level the index is either storing a pointer to the page containing the rows data or the clustered index key, so I wonder if including the key in the index is storing the data twice? Hmmm...

I was actually thinking of all the index columns in conjunction with each other. Lets say that for each datetime there is exactly 1 varchar(15) record such that select distinct datetime, varchar(15) returns 32 million rows (32m x 1, the Lead_A index). If for each varchar(10) column there are 4 distinct datetime records, then select varchar(10), datetime would return over 100 million rows (27m X 4, the Lead_B index). If the difference were dramatic, it might explain the extra time. However, I doubt this is the case since it sounds like your datetime is almost unique (putting both indexes at similar counts), the varchars have similar average length (I'm guessing), and the output from sys.dm_index_physical_stats seemed to show just a 2-1 ratio, not a 10-1 or 30-1 ratio.

I'm afraid I've dug to the end of my ideas. I can't think of anywhere else to look, but if I do, I'll sure post again. Sorry, and good luck!

Chad

Robert Klimes SSCoach Points: 17865 More actions · Answer 12

Robert Klimes

SSCoach

Points: 17865

September 2, 2008 at 12:52 pm

#866148

Thanks Chad, for the effort. This has, at least, given me some extra troubleshooting abilities/ideas I did not have before so effort was not futile.

Bob
-----------------------------------------------------------------------------
How to post to get the best help[/url]

GSquared SSC Guru Points: 260824 More actions · Answer 13

so I wonder if including the key in the index is storing the data twice? Hmmm...

Tested:



set nocount on;

go

create table IndexTest1 (

ID int identity primary key,

Val1 int,

Val2 int,

Val3 int);

go

insert into dbo.indextest1 (val1, val2, val3)

select top 1000000 checksum(newid()), checksum(newid()), checksum(newid())

from dbo.numbers n1

cross join dbo.numbers n2;

go

create index IDX_IndexTest1_NotIncluded

on dbo.indextest1 (val1, val2, val3)

go

create table IndexTest2 (

ID int identity primary key,

Val1 int,

Val2 int,

Val3 int);

go

insert into dbo.indextest2 (val1, val2, val3)

select val1, val2, val3

from dbo.indextest1;

go

create index IDX_IndexTest2_Include

on dbo.indextest2(val1, val2, val3, id)

include (ID);go

create table IndexTest3 (

ID int identity primary key,

Val1 int,

Val2 int,

Val3 int);

go

insert into dbo.indextest3 (val1, val2, val3)

select val1, val2, val3

from dbo.indextest1;

go

create index IDX_IndexTest3_Leaf

on dbo.indextest3 (val1, val2, val3, id);

Index on table 1 ended up at 21.469 Meg, and on each of the other two ended up at 21.461 Meg. I tested it three times, same result each time. So, either there's something wrong with my test, or including the ID column in the index cut the size by 8 kb per million rows, or that's the margin for error.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Chad Crawford SSChampion Points: 11610 More actions · Answer 14

Thanks GSquared. It looks like the margin of error. I got exactly the same number of pages in all 3 indexes, but the second had one more fragment than the others, which would have made made it exactly 8K smaller had it not been fragmented. For some reason it didn't even cross my mind to run a test like that, but sure enough there it is, plain as day.

Chad