Too many statistics

Question

Post reply

Too many statistics

Dennis Post

SSCarpal Tunnel

Points: 4473
More actions
November 28, 2012 at 9:03 am

#285228

Hi,

After reading Gail Shaws articles about performance issue finding, I thought i'd give it a go myself.

http://www.simple-talk.com/sql/performance/finding-the-causes-of-poor-performance-in-sql-server,-part-1/

While playing around with different NC indexes to no avail, I wanted to change the PK to include more columns. Unfortunately I can't. The second column I want to include is nullable. I cannot alter the column because a statistic is dependent on it.

In Object Viewer I noticed that there are a lot of statistics on this table.

SELECT COUNT(*)
FROM sys.stats
WHERE OBJECT_NAME(OBJECT_ID) = '<my table>'

Returns 246. Names start with _dta_ or _WA_sys.

I pretty sure dta stats are created by Database Tuning Advisor, but I don't know what the other one is.

So what I was wondering....

1)Is this a normal amount?

2)Can I safely drop all these?

3)If I do, do I have to create them all again?

4)How can I best go about altering my column?

5)Why did DTA create so many?

6)Where'd the _WA_sys stats come from?

Working with SQL 2005 on this dev box.

BTW, I don't know who ran DTA and commited the changes.

For better, quicker answers on T-SQL questions, read Jeff Moden's suggestions.[/url]

"Million-to-one chances crop up nine times out of ten." ― Terry Pratchett, Mort

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 1

The _WA_Sys stats are automatically created by SQL Server.

You can drop them if you want. SQL will recreate them if it needs. The _dta stats were created when someone ran the database tuning adviser and accepted its suggestions.

btw, I would not recommend you go and add more columns to your PK. The primary key is the unique key for the table, it should just be the column or set of columns that uniquely identifies a row and nothing more.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Dennis Post SSCarpal Tunnel Points: 4473 More actions · Answer 2

Thanks for replying.

So having so many statistics on a table with (36 cols) is normal or at least not unusual?

No bloat or overhead involved?

As for the PK, one of the indexes I was trying was a unique index with the PK column (CompNr Char(14)!!) plus another column (CustNr Int) which is involved in 2 joins. I think this would be a suitable candidate for a composite PK.

Turns out just dropping the existing nonclustered index on CustNr gives me 3x better performance. 3205 scans now 1 (Clustered index scan, 500000+rows 76%) and reads dropped from 502017 to 31760.

Btw, do you know where I can find documentation on Retrace? My searches just lead me back to the blog you posted. In the mean time I just grab the left 25-50 of textdata and group by that.

For better, quicker answers on T-SQL questions, read Jeff Moden's suggestions.[/url]

"Million-to-one chances crop up nine times out of ten." ― Terry Pratchett, Mort

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 3

MOC Ewez (11/29/2012)

So having so many statistics on a table with (36 cols) is normal or at least not unusual?

Depends. 🙂 If there are duplicate stats (between the dta and the auto-created), then you can drop the duplicate stats.

As for the PK, one of the indexes I was trying was a unique index with the PK column (CompNr Char(14)!!) plus another column (CustNr Int) which is involved in 2 joins. I think this would be a suitable candidate for a composite PK.

If the CompNr is the column that uniquely identifies the row, then the PK should be on CompNbr, not CompNbr + other columns. This is database design principals. Widen the PK unnecessarily and you introduce the possibility of incorrect data.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Dennis Post SSCarpal Tunnel Points: 4473 More actions · Answer 4

Ahhh, the dreaded but expected "Depends". :hehe:

How does one find duplicate statistics?

I've got a basic start.

SELECT OBJECT_NAME(S.[OBJECT_ID]), S.Name, SC.*



FROM	Sys.Stats S

		JOIN sys.Stats_Columns SC

			ON S.Stats_ID = SC.Stats_ID

			AND S.[Object_ID] = SC.[Object_ID]

		JOIN sys.Columns C

			ON SC.Column_ID = C.Column_ID

			AND SC.[Object_ID] = C.[Object_ID]

WHERE	OBJECT_NAME(S.[OBJECT_ID]) = 'myTable'

ORDER BY SC.[Object_ID], SC.Stats_ID DESC

I guess i need a recursive CTE or pivot into # table then group by and count....???

These are still weak spots for me. I think an overly dramatic cursor would be my best bet.

Do you happen to have a script for this up your sleeve?

.....I found one at : http://sqlserver-online.blogspot.nl/2010/11/multiple-statistics-sharing-same.htm

But it returns more rows than I have statistics on this object.

Btw, how do you alias urls? "click here

"

For better, quicker answers on T-SQL questions, read Jeff Moden's suggestions.[/url]

"Million-to-one chances crop up nine times out of ten." ― Terry Pratchett, Mort

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 5

MOC Ewez (11/29/2012)

I guess i need a recursive CTE or pivot into # table then group by and count....???

These are still weak spots for me. I think an overly dramatic cursor would be my best bet.

Personally I prefer just eyeballing it. Far less effort and more accurate.

Query for the stats and their columns with an appropriate order by and read over the list.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Bhuvnesh SSC Guru Points: 59351 More actions · Answer 6

MOC Ewez (11/29/2012)

As for the PK, one of the indexes I was trying was a unique index with the PK column (CompNr Char(14)!!) plus another column (CustNr Int) which is involved in 2 joins. I think this would be a suitable candidate for a composite PK.

i think you are thinking other way around , PK provide clustred index by default , so while thinking abt PK you should think abt uniqueness of records. then next thought would be about indexes So if joins getting help from PK (clus index ) which certainly will in most of the cases that perfect but if not then other combination of column should be considered (inlcuding key column) for other index creation

-------Bhuvnesh----------
I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

Dennis Post SSCarpal Tunnel Points: 4473 More actions · Answer 7

Personally I prefer just eyeballing it. Far less effort and more accurate.

:Wow:

I've been eyeballing for a good hour now. Even tried tilting my head an closing one eye for a while.

I reckon this eyeballing skill of yours isn't a natural trait built into human dna. Well at least not mine anyway. 😛

Do you do this with or without shoes?:-)

For better, quicker answers on T-SQL questions, read Jeff Moden's suggestions.[/url]

"Million-to-one chances crop up nine times out of ten." ― Terry Pratchett, Mort

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 8

MOC Ewez (11/29/2012)

Do you do this with or without shoes?:-)

Without of course. 🙂

What you want to do is compare the columns the stats are on and see if you have any subsets. So do you have a stats object on Col1 and another on Col1, Col2. If you do, the first is redundant.

I have a script that pulls all stats with a delimited list of columns, but it doesn't work on SQL 2005, so not much point in posting it here.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Dennis Post SSCarpal Tunnel Points: 4473 More actions · Answer 9

i think you are thinking other way around , PK provide clustred index by default , so while thinking abt PK you should think abt uniqueness of records. then next thought would be about indexes So if joins getting help from PK (clus index ) which certainly will in most of the cases that perfect but if not then other combination of column should be considered (inlcuding key column) for other index creation

What is wise for a PK seems to be highly debated amongst the pros. Some say it should be a short as possible "Description" of what your table is about, others will say keep it as small and contiguous possible (Often Identity) due to it being appended to other indexes. (And more)

Indeed you are right, I was thinking about the point of using a unique index when I could just make that index the PK. But testing things out with so many statistics stumped me.

It looks as if all my inherited DBs are chocablock with stats produced from both DTA and Auto Statistics. Is there a point when too many (Non-duplicate) stats is too much?

For better, quicker answers on T-SQL questions, read Jeff Moden's suggestions.[/url]

"Million-to-one chances crop up nine times out of ten." ― Terry Pratchett, Mort

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 10

MOC Ewez (11/29/2012)

What is wise for a PK seems to be highly debated amongst the pros. Some say it should be a short as possible "Description" of what your table is about, others will say keep it as small and contiguous possible (Often Identity) due to it being appended to other indexes. (And more)

Small and contiguous are considerations for the clustered index. PK != clustered index.

The primary key is a logical database design decision, it is a column or set of columns that uniquely identifies the row (used in foreign key relationships). That is all.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Dennis Post SSCarpal Tunnel Points: 4473 More actions · Answer 11

Small and contiguous are considerations for the clustered index. PK != clustered index.

The primary key is a logical database design decision, it is a column or set of columns that uniquely identifies the row (used in foreign key relationships). That is all.

Thanks for pointing that out! I had always thought it was a clustered index as i always create a PK when creating a table, which creates a clustered index if one doesn't exist. Never seen a table with a clustered index that is not the same as the pk.

For better, quicker answers on T-SQL questions, read Jeff Moden's suggestions.[/url]

"Million-to-one chances crop up nine times out of ten." ― Terry Pratchett, Mort

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 12

CREATE TABLE Blah (



 ID int identity,

 SomeDate DATETIME

 -- other fields

)

CREATE CLUSTERED INDEX idx_Blah on Blah (SomeDate)

ALTER TABLE Blah ADD CONSTRAINT pk_blah PRIMARY KEY NONCLUSTERED (ID)

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Dennis Post SSCarpal Tunnel Points: 4473 More actions · Answer 13

I made the same insightful example.

Found some helpful info to correct my mind set. Also found out you can you unique constraint to reference FKs. Don't know that either. This is turning out to be very educational. 🙂

PKs is a heated topic : http://social.msdn.microsoft.com/Forums/en/transactsql/thread/158d77f7-3029-43bc-bba6-a8a12374f00c

Anyway, back to the stats business.

I found and read Kendal van Dykes blog on overlapping statistics.

http://qa.sqlservercentral.com/blogs/kendalvandyke/2010/09/09/more-on-overlapping-statistics/ Which leads to another and another.

He's talking about overlapping statistics, is that what you mean by duplicates?

Here's the code to find my duplicate statistics.

Start by creating a function that reorders a string of strings. (For Keys in sp_HelpStats)

-- Reorder a set of values in a string.



CREATE FUNCTION ufn_SortStrings (@Base Varchar(MAX), @Splitter Char(1) = ',', @RetSplitter Char(1) = ',')

RETURNS Varchar(MAX)

AS

BEGIN

DECLARE -- Variables and initialize

	@RetVal Varchar(MAX),

	@Tmp Varchar(MAX)

SET @RetVal = ''

SET @Tmp = ''

DECLARE @TmpVals TABLE (Val Varchar(50)	)

-- Get rid of any preceding @Splitter

WHILE LEFT(@Base, 1) = @Splitter

BEGIN

	SET @Base = SUBSTRING(@Base, 2, LEN(@BASE) - 1)

END

-- Get rid of any trailing @Splitter

WHILE RIGHT(@Base, 1) = @Splitter

BEGIN

	SET @Base = LEFT(@Base, LEN(@BASE) - 1)

END

-- Loop through The base string

WHILE LEN(@Base) > 0

BEGIN

	IF CHARINDEX(@Splitter, @Base) > 0 -- Check if the splitter is there

	BEGIN

		SET @Tmp = LTRIM(RTRIM(SUBSTRING(@Base, 1, CHARINDEX(@Splitter,@Base)-1))) -- Grab the first part and trim any spaces

		SET @Base = SUBSTRING(@Base, CHARINDEX(@Splitter,@Base), LEN(@Base) - LEN(@Tmp)) -- Remove the split from the base string

		IF LEFT(@Base, 1) = @Splitter -- Remove the trailing @Spiltter if it's there

			SET @Base = SUBSTRING(@Base, 2, LEN(@BASE) - 1)

	END

	ELSE -- No splitter, must be the last part.

	BEGIN

		SET @Tmp = LTRIM(RTRIM(@Base))

		SET @Base = REPLACE(@Base, @Tmp,'')

	END

	IF REPLACE(@Tmp, ' ', '') <> '' -- Don't insert any empty values

		INSERT INTO @TmpVals VALUES (@Tmp) -- Enter the split value into a table

END

-- Loop through the ordered split values and add them to a string

DECLARE cVals Cursor FOR SELECT Val FROM @TmpVals ORDER BY 1

OPEN cVals

FETCH NEXT FROM cVals INTO @Tmp

WHILE @@FETCH_STATUS = 0

BEGIN

	SET @RetVal = @RetVal + @Tmp + @RetSplitter

	FETCH NEXT FROM cVals INTO @Tmp

END

CLOSE cVals

DEALLOCATE cVals

IF RIGHT(@RetVal, 1) = @RetSplitter

	SET @RetVal = LEFT(@RetVal, LEN(@RetVal) - 1) -- Get rid of the trailing @RetSplitter

RETURN @RetVal

END

And the code to create an SP to return only duplicates from the specified table.

--EXEC usp_FindDupStats '<myTable>'



CREATE PROCEDURE usp_FindDupStats @TblName Varchar(150)

AS

BEGIN

	--DECLARE -- Parameters (Testing)

	--	@TblName Varchar(150)

	--	SET @TblName = '<my Table>'

	DECLARE -- Variables

		@SQL NVarchar(500)

	CREATE TABLE #TblStats

	(

	StatName Varchar(100),

	Keys Varchar(MAX)

	)

	CREATE TABLE #TblStatsExt

	(

	TblName Varchar(150),

	StatName Varchar(100) NOT NULL,

	Keys Varchar(MAX),

	OrderedKeys Varchar(MAX)

	)

	ALTER TABLE #TblStatsExt ADD CONSTRAINT PK_Tmp PRIMARY KEY (StatName)

	SET @SQL = 'INSERT INTO #TblStats EXEC sp_HelpStats @Tbl, ''ALL'''

	EXEC sp_ExecuteSQL @SQL, N'@Tbl Varchar(150)', @Tbl=@TblName

	INSERT INTO #TblStatsExt

	SELECT		@TblName, StatName, Keys, Sandbox.dbo.ufn_SortStrings(Keys,',',',') --<<== Change DB

	FROM		#TblStats

	SELECT	T1.TblName, T1.StatName, T1.Keys, T1.OrderedKeys, RowNr

	FROM	#TblStatsExt T1

			JOIN

				(

				SELECT	StatName, OrderedKeys,

						ROW_NUMBER() OVER (PARTITION BY OrderedKeys ORDER BY StatName) RowNr

				FROM	#TblStatsExt

				) T2

				ON T1.StatName = T2.StatName

	WHERE	RowNr > 1

	DROP TABLE #TblStats

	DROP TABLE #TblStatsExt

END

Now I can use this to loop through and drop the duplicates.

I haven't done this yet as I don't (Yet) know enough about statistics to know if this could be dangerous.

For better, quicker answers on T-SQL questions, read Jeff Moden's suggestions.[/url]

"Million-to-one chances crop up nine times out of ten." ― Terry Pratchett, Mort