November 28, 2012 at 9:03 am
Hi,
After reading Gail Shaws articles about performance issue finding, I thought i'd give it a go myself.
While playing around with different NC indexes to no avail, I wanted to change the PK to include more columns. Unfortunately I can't. The second column I want to include is nullable. I cannot alter the column because a statistic is dependent on it.
In Object Viewer I noticed that there are a lot of statistics on this table.
SELECT COUNT(*)
FROM sys.stats
WHERE OBJECT_NAME(OBJECT_ID) = '<my table>'
Returns 246. Names start with _dta_ or _WA_sys.
I pretty sure dta stats are created by Database Tuning Advisor, but I don't know what the other one is.
So what I was wondering....
1)Is this a normal amount?
2)Can I safely drop all these?
3)If I do, do I have to create them all again?
4)How can I best go about altering my column?
5)Why did DTA create so many?
6)Where'd the _WA_sys stats come from?
Working with SQL 2005 on this dev box.
BTW, I don't know who ran DTA and commited the changes.
November 28, 2012 at 9:10 am
The _WA_Sys stats are automatically created by SQL Server.
You can drop them if you want. SQL will recreate them if it needs. The _dta stats were created when someone ran the database tuning adviser and accepted its suggestions.
btw, I would not recommend you go and add more columns to your PK. The primary key is the unique key for the table, it should just be the column or set of columns that uniquely identifies a row and nothing more.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
November 29, 2012 at 2:24 am
Thanks for replying.
So having so many statistics on a table with (36 cols) is normal or at least not unusual?
No bloat or overhead involved?
As for the PK, one of the indexes I was trying was a unique index with the PK column (CompNr Char(14)!!) plus another column (CustNr Int) which is involved in 2 joins. I think this would be a suitable candidate for a composite PK.
Turns out just dropping the existing nonclustered index on CustNr gives me 3x better performance. 3205 scans now 1 (Clustered index scan, 500000+rows 76%) and reads dropped from 502017 to 31760.
Btw, do you know where I can find documentation on Retrace? My searches just lead me back to the blog you posted. In the mean time I just grab the left 25-50 of textdata and group by that.
November 29, 2012 at 3:17 am
MOC Ewez (11/29/2012)
So having so many statistics on a table with (36 cols) is normal or at least not unusual?
Depends. 🙂 If there are duplicate stats (between the dta and the auto-created), then you can drop the duplicate stats.
As for the PK, one of the indexes I was trying was a unique index with the PK column (CompNr Char(14)!!) plus another column (CustNr Int) which is involved in 2 joins. I think this would be a suitable candidate for a composite PK.
If the CompNr is the column that uniquely identifies the row, then the PK should be on CompNbr, not CompNbr + other columns. This is database design principals. Widen the PK unnecessarily and you introduce the possibility of incorrect data.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
November 29, 2012 at 5:24 am
Ahhh, the dreaded but expected "Depends". :hehe:
How does one find duplicate statistics?
I've got a basic start.
SELECT OBJECT_NAME(S.[OBJECT_ID]), S.Name, SC.*
FROM Sys.Stats S
JOIN sys.Stats_Columns SC
ON S.Stats_ID = SC.Stats_ID
AND S.[Object_ID] = SC.[Object_ID]
JOIN sys.Columns C
ON SC.Column_ID = C.Column_ID
AND SC.[Object_ID] = C.[Object_ID]
WHERE OBJECT_NAME(S.[OBJECT_ID]) = 'myTable'
ORDER BY SC.[Object_ID], SC.Stats_ID DESC
I guess i need a recursive CTE or pivot into # table then group by and count....???
These are still weak spots for me. I think an overly dramatic cursor would be my best bet.
Do you happen to have a script for this up your sleeve?
.....I found one at : http://sqlserver-online.blogspot.nl/2010/11/multiple-statistics-sharing-same.htm
But it returns more rows than I have statistics on this object.
Btw, how do you alias urls? "click here
"
November 29, 2012 at 7:13 am
MOC Ewez (11/29/2012)
I guess i need a recursive CTE or pivot into # table then group by and count....???These are still weak spots for me. I think an overly dramatic cursor would be my best bet.
Personally I prefer just eyeballing it. Far less effort and more accurate.
Query for the stats and their columns with an appropriate order by and read over the list.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
November 29, 2012 at 7:36 am
MOC Ewez (11/29/2012)
As for the PK, one of the indexes I was trying was a unique index with the PK column (CompNr Char(14)!!) plus another column (CustNr Int) which is involved in 2 joins. I think this would be a suitable candidate for a composite PK.
i think you are thinking other way around , PK provide clustred index by default , so while thinking abt PK you should think abt uniqueness of records. then next thought would be about indexes So if joins getting help from PK (clus index ) which certainly will in most of the cases that perfect but if not then other combination of column should be considered (inlcuding key column) for other index creation
-------Bhuvnesh----------
I work only to learn Sql Server...though my company pays me for getting their stuff done;-)
November 29, 2012 at 7:42 am
Personally I prefer just eyeballing it. Far less effort and more accurate.
:Wow:
I've been eyeballing for a good hour now. Even tried tilting my head an closing one eye for a while.
I reckon this eyeballing skill of yours isn't a natural trait built into human dna. Well at least not mine anyway. 😛
Do you do this with or without shoes?:-)
November 29, 2012 at 8:04 am
MOC Ewez (11/29/2012)
Do you do this with or without shoes?:-)
Without of course. 🙂
What you want to do is compare the columns the stats are on and see if you have any subsets. So do you have a stats object on Col1 and another on Col1, Col2. If you do, the first is redundant.
I have a script that pulls all stats with a delimited list of columns, but it doesn't work on SQL 2005, so not much point in posting it here.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
November 29, 2012 at 8:12 am
i think you are thinking other way around , PK provide clustred index by default , so while thinking abt PK you should think abt uniqueness of records. then next thought would be about indexes So if joins getting help from PK (clus index ) which certainly will in most of the cases that perfect but if not then other combination of column should be considered (inlcuding key column) for other index creation
What is wise for a PK seems to be highly debated amongst the pros. Some say it should be a short as possible "Description" of what your table is about, others will say keep it as small and contiguous possible (Often Identity) due to it being appended to other indexes. (And more)
Indeed you are right, I was thinking about the point of using a unique index when I could just make that index the PK. But testing things out with so many statistics stumped me.
It looks as if all my inherited DBs are chocablock with stats produced from both DTA and Auto Statistics. Is there a point when too many (Non-duplicate) stats is too much?
November 29, 2012 at 8:20 am
MOC Ewez (11/29/2012)
What is wise for a PK seems to be highly debated amongst the pros. Some say it should be a short as possible "Description" of what your table is about, others will say keep it as small and contiguous possible (Often Identity) due to it being appended to other indexes. (And more)
Small and contiguous are considerations for the clustered index. PK != clustered index.
The primary key is a logical database design decision, it is a column or set of columns that uniquely identifies the row (used in foreign key relationships). That is all.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
November 29, 2012 at 8:58 am
Small and contiguous are considerations for the clustered index. PK != clustered index.
The primary key is a logical database design decision, it is a column or set of columns that uniquely identifies the row (used in foreign key relationships). That is all.
Thanks for pointing that out! I had always thought it was a clustered index as i always create a PK when creating a table, which creates a clustered index if one doesn't exist. Never seen a table with a clustered index that is not the same as the pk.
November 29, 2012 at 10:51 am
CREATE TABLE Blah (
ID int identity,
SomeDate DATETIME
-- other fields
)
CREATE CLUSTERED INDEX idx_Blah on Blah (SomeDate)
ALTER TABLE Blah ADD CONSTRAINT pk_blah PRIMARY KEY NONCLUSTERED (ID)
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
November 30, 2012 at 7:41 am
I made the same insightful example.
Found some helpful info to correct my mind set. Also found out you can you unique constraint to reference FKs. Don't know that either. This is turning out to be very educational. 🙂
PKs is a heated topic : http://social.msdn.microsoft.com/Forums/en/transactsql/thread/158d77f7-3029-43bc-bba6-a8a12374f00c
Anyway, back to the stats business.
I found and read Kendal van Dykes blog on overlapping statistics.
http://qa.sqlservercentral.com/blogs/kendalvandyke/2010/09/09/more-on-overlapping-statistics/ Which leads to another and another.
He's talking about overlapping statistics, is that what you mean by duplicates?
Here's the code to find my duplicate statistics.
Start by creating a function that reorders a string of strings. (For Keys in sp_HelpStats)
-- Reorder a set of values in a string.
CREATE FUNCTION ufn_SortStrings (@Base Varchar(MAX), @Splitter Char(1) = ',', @RetSplitter Char(1) = ',')
RETURNS Varchar(MAX)
AS
BEGIN
DECLARE -- Variables and initialize
@RetVal Varchar(MAX),
@Tmp Varchar(MAX)
SET @RetVal = ''
SET @Tmp = ''
DECLARE @TmpVals TABLE (Val Varchar(50) )
-- Get rid of any preceding @Splitter
WHILE LEFT(@Base, 1) = @Splitter
BEGIN
SET @Base = SUBSTRING(@Base, 2, LEN(@BASE) - 1)
END
-- Get rid of any trailing @Splitter
WHILE RIGHT(@Base, 1) = @Splitter
BEGIN
SET @Base = LEFT(@Base, LEN(@BASE) - 1)
END
-- Loop through The base string
WHILE LEN(@Base) > 0
BEGIN
IF CHARINDEX(@Splitter, @Base) > 0 -- Check if the splitter is there
BEGIN
SET @Tmp = LTRIM(RTRIM(SUBSTRING(@Base, 1, CHARINDEX(@Splitter,@Base)-1))) -- Grab the first part and trim any spaces
SET @Base = SUBSTRING(@Base, CHARINDEX(@Splitter,@Base), LEN(@Base) - LEN(@Tmp)) -- Remove the split from the base string
IF LEFT(@Base, 1) = @Splitter -- Remove the trailing @Spiltter if it's there
SET @Base = SUBSTRING(@Base, 2, LEN(@BASE) - 1)
END
ELSE -- No splitter, must be the last part.
BEGIN
SET @Tmp = LTRIM(RTRIM(@Base))
SET @Base = REPLACE(@Base, @Tmp,'')
END
IF REPLACE(@Tmp, ' ', '') <> '' -- Don't insert any empty values
INSERT INTO @TmpVals VALUES (@Tmp) -- Enter the split value into a table
END
-- Loop through the ordered split values and add them to a string
DECLARE cVals Cursor FOR SELECT Val FROM @TmpVals ORDER BY 1
OPEN cVals
FETCH NEXT FROM cVals INTO @Tmp
WHILE @@FETCH_STATUS = 0
BEGIN
SET @RetVal = @RetVal + @Tmp + @RetSplitter
FETCH NEXT FROM cVals INTO @Tmp
END
CLOSE cVals
DEALLOCATE cVals
IF RIGHT(@RetVal, 1) = @RetSplitter
SET @RetVal = LEFT(@RetVal, LEN(@RetVal) - 1) -- Get rid of the trailing @RetSplitter
RETURN @RetVal
END
And the code to create an SP to return only duplicates from the specified table.
--EXEC usp_FindDupStats '<myTable>'
CREATE PROCEDURE usp_FindDupStats @TblName Varchar(150)
AS
BEGIN
--DECLARE -- Parameters (Testing)
-- @TblName Varchar(150)
-- SET @TblName = '<my Table>'
DECLARE -- Variables
@SQL NVarchar(500)
CREATE TABLE #TblStats
(
StatName Varchar(100),
Keys Varchar(MAX)
)
CREATE TABLE #TblStatsExt
(
TblName Varchar(150),
StatName Varchar(100) NOT NULL,
Keys Varchar(MAX),
OrderedKeys Varchar(MAX)
)
ALTER TABLE #TblStatsExt ADD CONSTRAINT PK_Tmp PRIMARY KEY (StatName)
SET @SQL = 'INSERT INTO #TblStats EXEC sp_HelpStats @Tbl, ''ALL'''
EXEC sp_ExecuteSQL @SQL, N'@Tbl Varchar(150)', @Tbl=@TblName
INSERT INTO #TblStatsExt
SELECT @TblName, StatName, Keys, Sandbox.dbo.ufn_SortStrings(Keys,',',',') --<<== Change DB
FROM #TblStats
SELECT T1.TblName, T1.StatName, T1.Keys, T1.OrderedKeys, RowNr
FROM #TblStatsExt T1
JOIN
(
SELECT StatName, OrderedKeys,
ROW_NUMBER() OVER (PARTITION BY OrderedKeys ORDER BY StatName) RowNr
FROM #TblStatsExt
) T2
ON T1.StatName = T2.StatName
WHERE RowNr > 1
DROP TABLE #TblStats
DROP TABLE #TblStatsExt
END
Now I can use this to loop through and drop the duplicates.
I haven't done this yet as I don't (Yet) know enough about statistics to know if this could be dangerous.
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply