Normalize data using Pivot WITHOUT using Aggregate function

  • Crap -- you're right about not getting the same results. There has to be a way around using ISNULL. That specific piece is causing records to come back 200/minute, which won't work for a table that has 900,000 records.

  • Justin James (1/4/2010)


    ... There has to be a way around using ISNULL. ...

    YES, there is: Try a different solution!

    I have to ask again: What is your reason for refusing to have a look at the other solutions?? If you can wait more than a day for one query to finish there should be 10 minutes for an alternative...



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • There is a reason -- the reason is that the solution that I posted above WORKS, albeit slowly. I tried your solution, and I have performance issues with it as well (returns 15,000 records in 3 minutes). Also, the number/names of columns is dynamic -- hence the reason I went with a dynamic SQL result.

  • Well, since we just have learned that you're looking for a dynamic solution, here it is:

    Step 1: running the code from before excluding the PIVOT part, but storing the result into a temp table instead,

    Step 2: build a DynamicCrossTab as learned from Jeff Moden (by studying the related article in my signature...), Thanx again, Jeff 😉

    Step 3: Verify the resulting SQL code is what you want

    Step 4: uncomment the EXEC part at then end of the query and comment the PRINT section

    Issue that is still open:

    a) There is no information regarding the column order of the resulting PIVOT. And there are not many options either (e.g. order by Columnname or first appearance (=min(id)) or by number of appearance count(*) )

    SET nocount on

    declare @school table ([ID] int, ColumnName varchar(45), [Text] varchar(50))

    --Group 1

    insert into @school values (0, 'DestinationTableName', '[DW].[DimClass]')

    insert into @school values (1, 'ID', '123146')

    insert into @school values (2, 'SubjectArea', 'Math')

    insert into @school values (3, 'LocalCourseCode', '000A1')

    insert into @school values (4, 'SchoolID', '123')

    insert into @school values (5, 'Source', 'Hub')

    insert into @school values (6, 'CourseTitle', 'Algebra I')

    --Group 2

    insert into @school values (7, 'DestinationTableName', '[DW].[DimClass]')

    insert into @school values (8, 'ID', '94503')

    insert into @school values (9, 'SubjectArea', 'Science')

    insert into @school values (10, 'LocalCourseCode', '9037')

    insert into @school values (12, 'SchoolID', '46')

    insert into @school values (12, 'Source', 'Annex')

    insert into @school values (13, 'CourseTitle', 'Biology II')

    insert into @school values (14, 'TeacherID', '58398')

    insert into @school values (15, 'Credits Possible', '1.0')

    --Group 3

    insert into @school values (16, 'DestinationTableName', '[DW].[DimClass]')

    insert into @school values (17, 'ID', '84023')

    insert into @school values (18, 'Source', 'Annex')

    insert into @school values (19, 'CourseTitle', 'Physics I')

    ;with cte1 AS

    (

    SELECT id,

    row_number() OVER (order BY id) AS row

    FROM @school

    WHERE text = '[DW].[DimClass]'

    ),

    cte2 AS

    (

    SELECT

    row_number() OVER (order BY a.id) AS grp,

    a.id start,

    isnull(b.id,99) finish

    FROM cte1 a

    LEFT OUTER JOIN cte1 b ON a.row=b.row-1

    )

    SELECT s.*, c.grp

    INTO #TmpSchool

    FROM @school s

    INNER JOIN cte2 c ON s.id >= c.start AND id < c.finish

    DECLARE @SQL1 NVARCHAR(4000),

    @SQL2 NVARCHAR(4000),

    @SQL3 NVARCHAR(4000)

    --===== Create the "static" section of the code

    SELECT @SQL1 = 'SELECT max(CASE WHEN Columnname = ''ID'' THEN [text] ELSE null END) AS [ID]'+CHAR(10)

    --===== Create the "Mostly Static" section of the code

    SELECT @SQL3 = '

    from #TmpSchool GROUP BY grp'

    --===== The "Hard" Part - Concatenation to Form the Date Columns

    SELECT @SQL2 = COALESCE(@SQL2,'')

    + ' ,max(CASE WHEN Columnname = ' + QUOTENAME(d.Columnname,'''')

    + ' THEN [text] ELSE null END) AS [' + d.Columnname + ']' + CHAR(10)

    FROM

    (

    SELECT Columnname FROM #TmpSchool WHERE Columnname<>'ID' AND Columnname<>'DestinationTableName' GROUP BY Columnname

    ) d ORDER BY Columnname

    --===== Print the Dynamic SQL

    PRINT @SQL1 + @SQL2 + @SQL3

    --===== Execute the Dynamic SQL to create the desired report (uncomment to run it)

    --EXEC (@SQL1 + @SQL2 + @SQL3)

    DROP TABLE #TmpSchool



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • lmu92 (1/4/2010)


    Thanx again, Jeff

    Anytime, Lutz... thanks for the feedback.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Justin James (1/4/2010)


    There is a reason -- the reason is that the solution that I posted above WORKS, albeit slowly. I tried your solution, and I have performance issues with it as well (returns 15,000 records in 3 minutes). Also, the number/names of columns is dynamic -- hence the reason I went with a dynamic SQL result.

    Justin,

    Hope this helps

    --= New improved version - tested against 937932 rows with 100 "ColumnNames"

    --= And a maximum of 35 "ColumnNames" in any 1 group

    --= Execution time was about 30 seconds on my quad core / sql 2008 3GB ram (with results discarded)

    --= Execution time for 1041408 rows with results displayed in a grid and a maximum of 100 ColumnNames per group = 37 secs

    --= Clean up

    IF OBJECT_ID('tempdb..#GroupIDs') IS NOT NULL

    DROP TABLE #groupIDs

    IF OBJECT_ID('tempdb..#data') IS NOT NULL

    DROP TABLE #data

    --= Get the group header records into a temp table

    SELECT IDENTITY(BIGINT,1,1) AS ROWID, ID ,CONVERT(BIGINT,0) AS NextID

    INTO #GroupIDs

    FROM school s3

    WHERE s3.ColumnName = 'DestinationTableName'

    ORDER BY ID

    CREATE UNIQUE CLUSTERED INDEX ix_GRP_ID ON #GroupIDs (ID)

    --= Figure out the last record in the table

    DECLARE @MaxID BIGINT

    SELECT @MaxID = MAX(ID) FROM school

    --= Work out the range of IDs for each group

    --= Note - I tried a "Quirky Update" here and just could not get it to work - this is quick enough (<1 sec) though

    UPDATE G1

    SET NextID = ISNULL(G2.ID,@MaxID+1)

    FROM #GroupIDs G1

    LEFT OUTER JOIN #GroupIDs G2

    ON G2.ROWID = G1.ROWID+1

    --= This now pulls the actual data rows into a temp table with the new "GroupID" and a row number within each group

    SELECT S1.ID AS GroupID, S2.ID AS ItemID,S2.ColumnName,S2.Text ,ROW_NUMBER() OVER(PARTITION BY S1.ID ORDER BY S2.ID) AS ColNum

    INTO #Data

    FROM #GroupIDs G1

    JOIN school S1

    ON S1.ID = G1.ID

    JOIN school S2

    ON S2.ID > G1.ID AND S2.ID < G1.NextID

    CREATE UNIQUE CLUSTERED INDEX ix_data_IDs ON #data(GroupID, ItemID)

    CREATE INDEX ix_data_cols ON #data(ColumnName,ColNum)

    --= Now to build the dynamic query

    DECLARE @sql NVARCHAR(MAX)

    SET @sql = ''

    --= Get the distinct ColumnName values and positions (assuming the data will always place them in order in the first place)

    --= Use FOR XML PATH('') trick to get a list of columns as a string

    SELECT @sql=STUFF(

    (

    SELECT ', '+QUOTENAME(ColumnName,'[')

    FROM (

    SELECT ColumnName ,MAX(ColNum) AS Position

    FROM #Data

    GROUP BY ColumnName

    ) cols

    ORDER BY Position

    FOR XML PATH('')),1,1,'')

    --= Now build a dynamic SQL to PIVOT the data

    SET @sql = N'

    SELECT GroupID,' + @sql + '

    FROM (SELECT GroupID,ColumnName,Text FROM #data) a

    PIVOT (MAX([Text]) FOR ColumnName IN ('+@sql +')) AS PVT

    ORDER BY GroupID'

    --= And run it

    EXEC sp_executesql @sql

    --= Clean up

    IF OBJECT_ID('tempdb..#GroupIDs') IS NOT NULL

    DROP TABLE #groupIDs

    IF OBJECT_ID('tempdb..#data') IS NOT NULL

    DROP TABLE #data

    EDIT: Sorry, I forgot to say that I used a permanent table for this one called "school"

    It was populated with some horrible random data using this script:

    IF OBJECT_ID('school') IS NOT NULL

    DROP TABLE school

    GO

    IF OBJECT_ID('column_names') IS NULL

    CREATE TABLE column_names (ColumnName VARCHAR(50))

    ELSE

    TRUNCATE TABLE column_names

    GO

    insert column_names (ColumnName )

    select TOP 100 NEWID()

    from syscolumns

    ;with Numbers(N)

    as

    (

    select 1

    union all

    select N+1 from Numbers

    union all

    select N+2 from Numbers

    union all

    select N+3 from Numbers

    ),data(ID,ColumnName,[Text],Children)

    as

    (

    select top 20000 N,'DestinationTableName', '[DW].[DimClass]', ABS(CHECKSUM(NEWID())) % 100

    from Numbers

    )

    select ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS ID, CASE WHEN ROW_NUMBER() OVER( PARTITION BY ID ORDER BY (SELECT NULL)) = 1 THEN data.ColumnName ELSE Colums.ColumnName END AS ColumnName,

    CASE WHEN ROW_NUMBER() OVER( PARTITION BY ID ORDER BY (SELECT NULL)) = 1 THEN data.Text ELSE LEFT(Colums.ColumnName,ABS(CHECKSUM(Colums.ColumnName)) % 20) END As [Text]

    into school

    from data

    cross apply (SELECT TOP 5 ColumnName from column_names ORDER BY ColumnName UNION Select TOP(data.Children) ColumnName from column_names ORDER BY NEWID() ) Colums(ColumnName)

    option (maxrecursion 0)

    create unique clustered index ix_school_ID on school(ID)

    CREATE NONCLUSTERED INDEX ix_school_data

    ON [dbo].[school] ([ColumnName])

    INCLUDE ([ID])

    NOTE: the indexes at the end of this query are part of the solution

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

Viewing 6 posts - 46 through 50 (of 50 total)

You must be logged in to reply to this topic. Login to reply