Normalize data using Pivot WITHOUT using Aggregate function

Question

Post reply

Normalize data using Pivot WITHOUT using Aggregate function

Forum Etiquette: How to post Reporting Services problems

[/url]

Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
[/url]

How to Post Performance Problems - by Gail Shaw
[/url]

jjbear11 Default port Points: 1497 More actions · Answer 1

jjbear11

Default port

Points: 1497

January 4, 2010 at 12:42 pm

#1098581

Crap -- you're right about not getting the same results. There has to be a way around using ISNULL. That specific piece is causing records to come back 200/minute, which won't work for a table that has 900,000 records.

LutzM SSC Guru Points: 107049 More actions · Answer 2

Justin James (1/4/2010)

... There has to be a way around using ISNULL. ...

YES, there is: Try a different solution!

I have to ask again: What is your reason for refusing to have a look at the other solutions?? If you can wait more than a day for one query to finish there should be 10 minutes for an alternative...

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

jjbear11 Default port Points: 1497 More actions · Answer 3

jjbear11

Default port

Points: 1497

January 4, 2010 at 2:09 pm

#1098636

There is a reason -- the reason is that the solution that I posted above WORKS, albeit slowly. I tried your solution, and I have performance issues with it as well (returns 15,000 records in 3 minutes). Also, the number/names of columns is dynamic -- hence the reason I went with a dynamic SQL result.

LutzM SSC Guru Points: 107049 More actions · Answer 4

Well, since we just have learned that you're looking for a dynamic solution, here it is:

Step 1: running the code from before excluding the PIVOT part, but storing the result into a temp table instead,

Step 2: build a DynamicCrossTab as learned from Jeff Moden (by studying the related article in my signature...), Thanx again, Jeff 😉

Step 3: Verify the resulting SQL code is what you want

Step 4: uncomment the EXEC part at then end of the query and comment the PRINT section

Issue that is still open:

a) There is no information regarding the column order of the resulting PIVOT. And there are not many options either (e.g. order by Columnname or first appearance (=min(id)) or by number of appearance count(*) )

SET nocount on



declare  @school table ([ID] int, ColumnName varchar(45), [Text] varchar(50))

--Group 1

insert into @school values (0, 'DestinationTableName', '[DW].[DimClass]')

insert into @school values (1, 'ID', '123146')

insert into @school values (2, 'SubjectArea', 'Math')

insert into @school values (3, 'LocalCourseCode', '000A1')

insert into @school values (4, 'SchoolID', '123')

insert into @school values (5, 'Source', 'Hub')

insert into @school values (6, 'CourseTitle', 'Algebra I')

--Group 2

insert into @school values (7, 'DestinationTableName', '[DW].[DimClass]')

insert into @school values (8, 'ID', '94503')

insert into @school values (9, 'SubjectArea', 'Science')

insert into @school values (10, 'LocalCourseCode', '9037')

insert into @school values (12, 'SchoolID', '46')

insert into @school values (12, 'Source', 'Annex')

insert into @school values (13, 'CourseTitle', 'Biology II')

insert into @school values (14, 'TeacherID', '58398')

insert into @school values (15, 'Credits Possible', '1.0')

--Group 3

insert into @school values (16, 'DestinationTableName', '[DW].[DimClass]')

insert into @school values (17, 'ID', '84023')

insert into @school values (18, 'Source', 'Annex')

insert into @school values (19, 'CourseTitle', 'Physics I')

;with cte1 AS

(

        SELECT id,

                row_number() OVER (order BY id) AS row

        FROM @school

        WHERE text = '[DW].[DimClass]'

),

cte2 AS

(

        SELECT

                row_number() OVER (order BY a.id) AS grp,

                a.id start,

                isnull(b.id,99) finish

        FROM cte1 a

                LEFT OUTER JOIN cte1 b ON a.row=b.row-1

)

SELECT s.*, c.grp

INTO #TmpSchool

FROM @school s

        INNER JOIN cte2 c ON s.id >= c.start AND id < c.finish

DECLARE @SQL1 NVARCHAR(4000),

        @SQL2 NVARCHAR(4000),

        @SQL3 NVARCHAR(4000)

--===== Create the "static" section of the code

SELECT @SQL1 = 'SELECT max(CASE WHEN Columnname = ''ID'' THEN [text] ELSE null END) AS [ID]'+CHAR(10)

--===== Create the "Mostly Static" section of the code

SELECT @SQL3 = '

from #TmpSchool GROUP BY grp'

--===== The "Hard" Part - Concatenation to Form the Date Columns

SELECT @SQL2 = COALESCE(@SQL2,'')

	+ '       ,max(CASE WHEN Columnname = ' + QUOTENAME(d.Columnname,'''')

	+ ' THEN [text] ELSE null END) AS [' + d.Columnname + ']' + CHAR(10)

	FROM

	(

	SELECT Columnname FROM #TmpSchool WHERE Columnname<>'ID' AND Columnname<>'DestinationTableName' GROUP BY Columnname

	) d  ORDER BY Columnname

--===== Print the Dynamic SQL

PRINT @SQL1 + @SQL2 + @SQL3

--===== Execute the Dynamic SQL to create the desired report (uncomment to run it)

--EXEC (@SQL1 + @SQL2 + @SQL3)

DROP TABLE #TmpSchool

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 5

lmu92 (1/4/2010)

Thanx again, Jeff

Anytime, Lutz... thanks for the feedback.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

mister.magoo SSC-Forever Points: 47068 More actions · Answer 6

Justin James (1/4/2010)

There is a reason -- the reason is that the solution that I posted above WORKS, albeit slowly. I tried your solution, and I have performance issues with it as well (returns 15,000 records in 3 minutes). Also, the number/names of columns is dynamic -- hence the reason I went with a dynamic SQL result.

Justin,

Hope this helps



--= New improved version - tested against 937932 rows with 100 "ColumnNames"

--= And a maximum of 35 "ColumnNames" in any 1 group

--= Execution time was about 30 seconds on my quad core / sql 2008 3GB ram (with results discarded)

--= Execution time for 1041408 rows with results displayed in a grid and a maximum of 100 ColumnNames per group = 37 secs 

--= Clean up

IF OBJECT_ID('tempdb..#GroupIDs') IS NOT NULL

DROP TABLE #groupIDs

IF OBJECT_ID('tempdb..#data') IS NOT NULL

DROP TABLE #data

--= Get the group header records into a temp table

SELECT IDENTITY(BIGINT,1,1) AS ROWID, ID ,CONVERT(BIGINT,0) AS NextID

INTO #GroupIDs

FROM school s3

WHERE s3.ColumnName = 'DestinationTableName'

ORDER BY ID 

CREATE UNIQUE CLUSTERED INDEX ix_GRP_ID ON #GroupIDs (ID)

--= Figure out the last record in the table

DECLARE @MaxID BIGINT

SELECT @MaxID = MAX(ID) FROM school

--= Work out the range of IDs for each group

--= Note - I tried a "Quirky Update" here and just could not get it to work - this is quick enough (<1 sec) though

UPDATE G1

SET NextID = ISNULL(G2.ID,@MaxID+1)

FROM #GroupIDs G1

LEFT OUTER JOIN #GroupIDs G2

ON G2.ROWID = G1.ROWID+1

--= This now pulls the actual data rows into a temp table with the new "GroupID" and a row number within each group

SELECT S1.ID AS GroupID, S2.ID AS ItemID,S2.ColumnName,S2.Text ,ROW_NUMBER() OVER(PARTITION BY S1.ID ORDER BY S2.ID) AS ColNum

INTO #Data

FROM #GroupIDs G1

JOIN school S1

ON S1.ID = G1.ID

JOIN school S2

ON S2.ID > G1.ID AND S2.ID < G1.NextID

CREATE UNIQUE CLUSTERED INDEX ix_data_IDs ON #data(GroupID, ItemID)

CREATE INDEX ix_data_cols ON #data(ColumnName,ColNum)

--= Now to build the dynamic query

DECLARE @sql NVARCHAR(MAX)

SET @sql = ''

--= Get the distinct ColumnName values and positions (assuming the data will always place them in order in the first place)

--= Use FOR XML PATH('') trick to get a list of columns as a string

SELECT @sql=STUFF(

					(

					SELECT ', '+QUOTENAME(ColumnName,'[')

					FROM (

						SELECT ColumnName ,MAX(ColNum) AS Position

						FROM #Data

						GROUP BY ColumnName

						) cols

					ORDER BY Position

					FOR XML PATH('')),1,1,'')

--= Now build a dynamic SQL to PIVOT the data

SET @sql = N'

SELECT GroupID,' + @sql + '

FROM (SELECT GroupID,ColumnName,Text FROM #data) a

PIVOT (MAX([Text]) FOR ColumnName IN ('+@sql +')) AS PVT

ORDER BY GroupID'

--= And run it

EXEC sp_executesql @sql

--= Clean up

IF OBJECT_ID('tempdb..#GroupIDs') IS NOT NULL

DROP TABLE #groupIDs

IF OBJECT_ID('tempdb..#data') IS NOT NULL

DROP TABLE #data

EDIT: Sorry, I forgot to say that I used a permanent table for this one called "school"

It was populated with some horrible random data using this script:



IF OBJECT_ID('school') IS NOT NULL

DROP TABLE school

GO

IF OBJECT_ID('column_names') IS NULL

CREATE TABLE column_names (ColumnName VARCHAR(50))

ELSE

TRUNCATE TABLE column_names

GO

insert column_names (ColumnName )

select TOP 100 NEWID()

from syscolumns

;with Numbers(N)

as

(

select 1

union all

select N+1 from Numbers

union all

select N+2 from Numbers

union all

select N+3 from Numbers

),data(ID,ColumnName,[Text],Children)

as

(

select top 20000 N,'DestinationTableName', '[DW].[DimClass]', ABS(CHECKSUM(NEWID())) % 100

from Numbers

)

select ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS ID, CASE WHEN ROW_NUMBER() OVER( PARTITION BY ID ORDER BY (SELECT NULL)) = 1 THEN data.ColumnName ELSE Colums.ColumnName END AS ColumnName,

	CASE WHEN ROW_NUMBER() OVER( PARTITION BY ID ORDER BY (SELECT NULL)) = 1 THEN data.Text ELSE LEFT(Colums.ColumnName,ABS(CHECKSUM(Colums.ColumnName)) % 20) END As [Text]

into school

from data

cross apply (SELECT TOP 5 ColumnName from column_names ORDER BY ColumnName UNION Select TOP(data.Children) ColumnName from column_names ORDER BY NEWID() ) Colums(ColumnName)

option (maxrecursion 0)

create unique clustered index ix_school_ID on school(ID)

CREATE NONCLUSTERED INDEX ix_school_data

ON [dbo].[school] ([ColumnName])

INCLUDE ([ID])

NOTE: the indexes at the end of this query are part of the solution

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);