A script to answer: Should I use SPARSE?

  • Comments posted to this topic are about the item A script to answer: Should I use SPARSE?

  • SET NOCOUNT ON

    GO

    DROP TABLE #tblHold

    GO

    DECLARE @tblA TABLE (RowId INT IDENTITY(1,1),SchemaName VARCHAR(65),TableName VARCHAR(65),TotalRows BIGINT)

    INSERT INTO @tblA

    SELECT S.[name] AS SchemaName,

    ST.[name] AS TableName,

    SUM(p.[rows]) AS TotalRows

    FROM [sys].[tables] AS ST WITH (NOLOCK)

    INNER JOIN [sys].[schemas] AS S WITH (NOLOCK)

    ON ST.[schema_id] = S.[schema_id]

    INNER JOIN [sys].[partitions] AS P WITH (NOLOCK)

    ON P.[object_id] = ST.[object_id]

    WHERE P.[index_id] IN (0,1)

    GROUP BY

    S.[name],

    ST.[name]

    DECLARE @tblB TABLE

    (RowId INT IDENTITY(1,1),SchemaName VARCHAR(65),TableName VARCHAR(65),ColumnName VARCHAR(65),

    ExecSQL VARCHAR(2000),NullRows BIGINT)

    INSERT INTO @tblB

    SELECT S.[name] AS SchemaName,

    ST.[name] AS TableName,

    SC.[name] AS ColumnName,

    'SELECT COUNT(1) FROM ['+DB_NAME()+'].['+S.[name]+'].['+ST.[name]+'] WITH (NOLOCK) WHERE ['+SC.[name]+'] IS NULL' AS ExecSQL,

    0 AS NullRows

    FROM [sys].[tables] AS ST WITH (NOLOCK)

    INNER JOIN [sys].[schemas] AS S WITH (NOLOCK)

    ON ST.[schema_id] = S.[schema_id]

    INNER JOIN [sys].[columns] AS SC WITH (NOLOCK)

    ON ST.[object_id] = SC.[object_id]

    WHERE 1=1

    AND SC.[is_nullable] = 1

    ORDER BY

    S.[name],

    ST.[name],

    SC.[column_id]

    DECLARE @MinId INT

    DECLARE @MaxId INT

    DECLARE @ExecSQL VARCHAR(2000)

    DECLARE @tblE TABLE(MyRows BIGINT)

    SELECT @MinId=MIN(RowId),@MaxId=MAX(RowId) FROM @tblB

    WHILE(@MinId<=@MaxId)

    BEGIN

    DELETE @tblE

    SELECT @ExecSQL=ExecSQL FROM @tblB WHERE RowId = @MinId

    INSERT INTO @tblE EXEC(@ExecSQL)

    UPDATE @tblB

    SET NullRows=(SELECT MyRows FROM @tblE)

    WHERE RowId = @MinId

    SELECT @MinId=@MinId+1

    END

    SELECT B.RowId,

    A.SchemaName,

    A.TableName,

    B.ColumnName,

    B.NullRows,

    A.TotalRows,

    CAST(CAST(CAST(NullRows AS VARCHAR(10))+'.00' AS DEC(15,2))/CAST(CAST(TotalRows AS VARCHAR(10))+'.00' AS DEC(15,2))*100 AS DEC(5,2)) AS PercentageNull

    INTO #tblHold

    FROM @tblA AS A

    INNER JOIN @tblB AS B

    ON A.SchemaName = B.SchemaName

    AND A.TableName = B.TableName

    WHERE NullRows > 0

    AND NullRows <> TotalRows

    GO

    SELECT *

    FROM #tblHold

    ORDER BY

    PercentageNull DESC

  • Any ideas how this might work in an OLAP environment? At least in my world, this is where this situation is more likely encountered.

  • I don't have any performance benchmarks yet. I am likely months from getting this into a deployment.

    But since MSFT claims there is no overhead on NULL and only slight on NOT NULL values, I would expect a net improvement considering the reduced storage per row for most rows.

  • NB: One side-affect of using SPARSE columns is the incompatibility with data compression features.

    We had a case where new developers decided to add new columns with the SPARSE column feature but did not realize this limitation until we hit disk space issues. Suggest you make this decision early in the project design data modelling stage.

    T

  • Thanks for pointing that out TerryCatt. Yes, you can use one or the other (Sparse or Compression), but not both.

    Also, for SQL 2014, data compression is not supported for In-Memory tables; and Sparse columns are not compatible with Clustered Columnstore Indexes.

    Also, please note for Sparse columns, the space savings isn't realized until the clustered index is rebuilt.

  • Didn't know about SPARCE. What a cool idea. I'll look into it.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply