% of total count

Question

Post reply

% of total count

SQLPain

Hall of Fame

Points: 3522
More actions
September 15, 2016 at 10:59 am

#329880

I want to add 4th column (% of total count) for example:

State Name------------Count1--------Total Charge Off Amount--------% of Total Count

Alabama------------------- 5 ---------------- 10000 ------------------ 50%

Arizona------------------- 2 ---------------- 40000 ---------------------20%

Arkansas -------------------3 --------------- 5000 -------------------- 30%

SELECT A.[State Name] ,COUNT(A.[State Name]) AS [Count1] ,SUM (A.ChargeOffAmount) AS [Total Charge Off Amount] FROM ( SELECT DISTINCT L.LoanID , H.TransactionDate , CASE WHEN H.TransactionAmt > 0 THEN H.TransactionAmt ELSE L.PrincipalBal END AS [ChargeOffAmount] , ST.Name AS [State Name] FROM Loan AS L JOIN History AS H ON L.LoanID = H.LoanID LEFT JOIN CreditScoreDetail AS CSD ON L.LoanID = CSD.LoanID AND L.AssmRecCounter = CSD.AssmRecCounter LEFT JOIN Property AS PRO ON L.LoanID = PRO.LoanID LEFT JOIN State AS ST ON PRO.State = ST.Code WHERE CONVERT(varchar(20), H.TransactionDate, 101) >= @StartDate AND CONVERT(varchar(20), H.TransactionDate, 101) <= @EndDate ) AS A LEFT OUTER JOIN Loan AS L ON L.loanID = A.LoanID GROUP BY A.[State Name]

Can somebody suggest the code for 4th column

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

Alan Burstein SSC Guru Points: 61136 More actions · Answer 1



DECLARE @table TABLE (stateName varchar(20), Count1 int, TotalChargeOffAmount int);

INSERT @table

VALUES

('Alabama',5,1000),

('Arizona',2,4000),

('Arkansas',3,5000);

SELECT

  stateName,

  Count1,

  TotalChargeOffAmount,

  [% of Total Count] = ((Count1*1.)/CountTotal)*100

FROM @table t1

CROSS APPLY (SELECT CountTotal = SUM(Count1) FROM @table) t2;

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

Gazareth One Orange Chip Points: 27737 More actions · Answer 2

Gazareth

One Orange Chip

Points: 27737

September 15, 2016 at 1:05 pm

#1901204

Also SUM with OVER()



DECLARE @table TABLE (stateName varchar(20), Count1 int, TotalChargeOffAmount int);

INSERT @table

VALUES

('Alabama',5,1000),

('Arizona',2,4000),

('Arkansas',3,5000);

SELECT

  stateName,

  Count1,

  TotalChargeOffAmount,

  100.0*Count1 / SUM(t1.[Count1]) OVER () AS [Sum]

FROM @table t1

drew.allen SSC Guru Points: 76776 More actions · Answer 3

Both Alan.B and Gazareth started with the summarized data. Here is how you would do it starting with the raw data.



DECLARE @table TABLE (stateName varchar(20), Amount int);

INSERT @table

VALUES

('Alabama',200),

('Alabama',200),

('Alabama',200),

('Alabama',200),

('Alabama',200),

('Arizona',2000),

('Arizona',2000),

('Arkansas',2000),

('Arkansas',2000),

('Arkansas',1000)

;

SELECT t.stateName, COUNT(*), SUM(t.Amount), 100.0 * COUNT(*) / SUM(COUNT(*)) OVER()

FROM @table t

GROUP BY t.stateName

Note that the denominator is an windowed aggregate of a regular aggregate.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

Michael L John One Orange Chip Points: 26551 More actions · Answer 4

It also appears that you may have a logic error in your code:



WHERE

    CONVERT(varchar(20), H.TransactionDate, 101) >= @StartDate

AND CONVERT(varchar(20), H.TransactionDate, 101) <= @EndDate

If the H.TransactionDate field is a datetime, and contains a time portion, this appears to leave off everything for the end date.

As an example, the @EndDate is 9/30/2016, which is midnight.

If there are records that are created AFTER midnight, then this will not include them.

Michael L John
If you assassinate a DBA, would you pull a trigger?
To properly post on a forum:
http://www.sqlservercentral.com/articles/61537/

drew.allen SSC Guru Points: 76776 More actions · Answer 5

Michael L John (9/15/2016)

It also appears that you may have a logic error in your code:

WHERE CONVERT(varchar(20), H.TransactionDate, 101) >= @StartDate AND CONVERT(varchar(20), H.TransactionDate, 101) <= @EndDate

If the H.TransactionDate field is a datetime, and contains a time portion, this appears to leave off everything for the end date.

As an example, the @EndDate is 9/30/2016, which is midnight.

If there are records that are created AFTER midnight, then this will not include them.

The 101 format does not include the time portion. That being said, you would be better off casting these to DATE than VARCHAR. SQL 2012 can still use an index seek when casting DATETIME data to DATE, but it cannot when converting it to VARCHAR.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

SQLPain Hall of Fame Points: 3522 More actions · Answer 6

SQLPain

Hall of Fame

Points: 3522

September 15, 2016 at 1:31 pm

#1901210

If its after midnight 09/30 then shouldn't the date be 10/01?

SQLPain Hall of Fame Points: 3522 More actions · Answer 7

SQLPain

Hall of Fame

Points: 3522

September 15, 2016 at 1:35 pm

#1901211

That's right, it does not have any time portion, so everything 09/30 should be listed

Michael L John One Orange Chip Points: 26551 More actions · Answer 8

SQLPain (9/15/2016)

That's right, it does not have any time portion, so everything 09/30 should be listed

Good, because this is a frequent error that I see.

2016-09-30 15:39:50.197 would be an example of a time after midnight. if your data contained times, then this would not appear in your query.

Michael L John
If you assassinate a DBA, would you pull a trigger?
To properly post on a forum:
http://www.sqlservercentral.com/articles/61537/

Alan Burstein SSC Guru Points: 61136 More actions · Answer 9

Gazareth (9/15/2016)

Also SUM with OVER()

DECLARE @table TABLE (stateName varchar(20), Count1 int, TotalChargeOffAmount int); INSERT @table VALUES ('Alabama',5,1000), ('Arizona',2,4000), ('Arkansas',3,5000); SELECT stateName, Count1, TotalChargeOffAmount, 100.0*Count1 / SUM(t1.[Count1]) OVER () AS [Sum] FROM @table t1

The problem with SUM OVER() in this case (and often with Window Aggregate Functions in general) is that it requires work tables to be written the the tempdb and makes the query slower. Consider the following two queries:



DECLARE @table TABLE (stateName varchar(20), Count1 int, TotalChargeOffAmount int);

INSERT @table

VALUES

('Alabama',5,1000),

('Arizona',2,4000),

('Arkansas',3,5000);

SET STATISTICS IO ON;

PRINT 'Using CROSS APPLY and SUM';

SELECT

  stateName,

  Count1,

  TotalChargeOffAmount,

  [% of Total Count] = (Count1*100.)/CountTotal

FROM @table t1

CROSS APPLY (SELECT CountTotal = SUM(Count1) FROM @table) t2;

PRINT CHAR(13)+CHAR(10)+'Using SUM OVER()';

SELECT

  stateName,

  Count1,

  TotalChargeOffAmount,

  100.0*Count1 / SUM(t1.Count1) OVER () AS [Sum]

FROM @table t1;

SET STATISTICS IO OFF;

and here's the IO results:



Using CROSS APPLY and SUM

Table '#ABCCA541'. Scan count 2, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Using SUM OVER()

Table 'Worktable'. Scan count 3, logical reads 11, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#ABCCA541'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Using SUM OVER() required 6X the number of reads. This is why I generally avoid Window Aggregate Functions. This is discussed in this article:

Nasty Fast PERCENT_RANK[/url]

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 10

Alan.B (9/15/2016)

Gazareth (9/15/2016)

Also SUM with OVER()

DECLARE @table TABLE (stateName varchar(20), Count1 int, TotalChargeOffAmount int); INSERT @table VALUES ('Alabama',5,1000), ('Arizona',2,4000), ('Arkansas',3,5000); SELECT stateName, Count1, TotalChargeOffAmount, 100.0*Count1 / SUM(t1.[Count1]) OVER () AS [Sum] FROM @table t1

The problem with SUM OVER() in this case (and often with Window Aggregate Functions in general) is that it requires work tables to be written the the tempdb and makes the query slower. Consider the following two queries:

DECLARE @table TABLE (stateName varchar(20), Count1 int, TotalChargeOffAmount int); INSERT @table VALUES ('Alabama',5,1000), ('Arizona',2,4000), ('Arkansas',3,5000); SET STATISTICS IO ON; PRINT 'Using CROSS APPLY and SUM'; SELECT stateName, Count1, TotalChargeOffAmount, [% of Total Count] = (Count1*100.)/CountTotal FROM @table t1 CROSS APPLY (SELECT CountTotal = SUM(Count1) FROM @table) t2; PRINT CHAR(13)+CHAR(10)+'Using SUM OVER()'; SELECT stateName, Count1, TotalChargeOffAmount, 100.0*Count1 / SUM(t1.Count1) OVER () AS [Sum] FROM @table t1; SET STATISTICS IO OFF;

and here's the IO results:

Using CROSS APPLY and SUM Table '#ABCCA541'. Scan count 2, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Using SUM OVER() Table 'Worktable'. Scan count 3, logical reads 11, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table '#ABCCA541'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Using SUM OVER() required 6X the number of reads. This is why I generally avoid Window Aggregate Functions. This is discussed in this article:

Nasty Fast PERCENT_RANK[/url]

Although I absolutely agree that a large number of reads are usually an indication that something has gone haywire and could stand some real improvement, it's not always the case. "It Depends". In the past, I've seen Paul White solve a running total using a Triangular Join but the CPU and Duration were incredibly low rivaling even the Quirky Update. And, Tuesday night, I gave a demo of solving a problem using an initial single column GROUP BY to pre-aggregate 7 million row / 8GB table and then used 4 SUM() OVERs to very quickly solve for 4 other sums using to calculate percentages related to the first along with 4 DENSE RANKs. There were "only" ~31 thousand reads from the original table and ~482,000 reads from the work table. The SUM() OVERs also replaced the need for 4 other GROUP BY queries that would have required a join to the original GROUP BY to solve the problem. In the cCTE, that would have required the pre-aggregation to execute 4 times and doing it with a TempTable already took a bit more than 7 seconds not to mention the fact that 4 additional queries where starting to make the code a fair bit more complex.

The code with initial preagg GROUP BY followed by a cCTE that used SUM() OVER 4 times after that (all in same query, BTW), solved the problem in 4.5 seconds duration (INCLUDING display time of ~57,000 rows) and just over 9 seconds of CPU time and made the code incredibly simple to read and maintain.

The reason I came back on this was that I didn't want anyone to develop the habit of avoiding things like (SUM) OVER (or any query) based solely on the number of READs sometimes produced. You have to look at the "big 4" (CPU, Reads, Writes, and Duration, which is the most important to the persons using the code) as a complete package compared to how other methods would perform.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Gazareth One Orange Chip Points: 27737 More actions · Answer 11

Jeff Moden (9/16/2016)

Alan.B (9/15/2016)

Gazareth (9/15/2016)

Also SUM with OVER()

DECLARE @table TABLE (stateName varchar(20), Count1 int, TotalChargeOffAmount int); INSERT @table VALUES ('Alabama',5,1000), ('Arizona',2,4000), ('Arkansas',3,5000); SELECT stateName, Count1, TotalChargeOffAmount, 100.0*Count1 / SUM(t1.[Count1]) OVER () AS [Sum] FROM @table t1

The problem with SUM OVER() in this case (and often with Window Aggregate Functions in general) is that it requires work tables to be written the the tempdb and makes the query slower. Consider the following two queries:

DECLARE @table TABLE (stateName varchar(20), Count1 int, TotalChargeOffAmount int); INSERT @table VALUES ('Alabama',5,1000), ('Arizona',2,4000), ('Arkansas',3,5000); SET STATISTICS IO ON; PRINT 'Using CROSS APPLY and SUM'; SELECT stateName, Count1, TotalChargeOffAmount, [% of Total Count] = (Count1*100.)/CountTotal FROM @table t1 CROSS APPLY (SELECT CountTotal = SUM(Count1) FROM @table) t2; PRINT CHAR(13)+CHAR(10)+'Using SUM OVER()'; SELECT stateName, Count1, TotalChargeOffAmount, 100.0*Count1 / SUM(t1.Count1) OVER () AS [Sum] FROM @table t1; SET STATISTICS IO OFF;

and here's the IO results:

Using CROSS APPLY and SUM Table '#ABCCA541'. Scan count 2, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Using SUM OVER() Table 'Worktable'. Scan count 3, logical reads 11, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table '#ABCCA541'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Using SUM OVER() required 6X the number of reads. This is why I generally avoid Window Aggregate Functions. This is discussed in this article:

Nasty Fast PERCENT_RANK[/url]

Although I absolutely agree that a large number of reads are usually an indication that something has gone haywire and could stand some real improvement, it's not always the case. "It Depends". In the past, I've seen Paul White solve a running total using a Triangular Join but the CPU and Duration were incredibly low rivaling even the Quirky Update. And, Tuesday night, I gave a demo of solving a problem using an initial single column GROUP BY to pre-aggregate 7 million row / 8GB table and then used 4 SUM() OVERs to very quickly solve for 4 other sums using to calculate percentages related to the first along with 4 DENSE RANKs. There were "only" ~31 thousand reads from the original table and ~482,000 reads from the work table. The SUM() OVERs also replaced the need for 4 other GROUP BY queries that would have required a join to the original GROUP BY to solve the problem. In the cCTE, that would have required the pre-aggregation to execute 4 times and doing it with a TempTable already took a bit more than 7 seconds not to mention the fact that 4 additional queries where starting to make the code a fair bit more complex.

The code with initial preagg GROUP BY followed by a cCTE that used SUM() OVER 4 times after that (all in same query, BTW), solved the problem in 4.5 seconds duration (INCLUDING display time of ~57,000 rows) and just over 9 seconds of CPU time and made the code incredibly simple to read and maintain.

The reason I came back on this was that I didn't want anyone to develop the habit of avoiding things like (SUM) OVER (or any query) based solely on the number of READs sometimes produced. You have to look at the "big 4" (CPU, Reads, Writes, and Duration, which is the most important to the persons using the code) as a complete package compared to how other methods would perform.

Interesting stuff both! The actual execution plans are weighted in the other direction, with the cross apply 2x the cost of the sum() over().

Presumable due to 2 table scans with apply, 1 with sum() over().

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 12

What you have to remember is, all of the costs are estimates. The only thing that ultimately matters is the actual performance.

Here's my favorite example on performance. Run the following code with the Actual execution plan turned on. The first query will give a "% of Batch" = 0 and the second query will = 100. Then read the messages tab and find out that nearly then exact opposite is true.

RAISERROR('Reursive CTE',0,0) WITH NOWAIT;



    SET STATISTICS TIME,IO ON;

   WITH cte AS

(

 SELECT N = 1

  UNION ALL

 SELECT N = N + 1

   FROM cte

  WHERE N < 1000000

)

 SELECT *

   INTO #MyHead1

   FROM cte

 OPTION (MAXRECURSION 0)

;

    SET STATISTICS TIME,IO OFF;

  PRINT REPLICATE('-',119)

;

        RAISERROR('Pseudo-Cursor',0,0) WITH NOWAIT;

    SET STATISTICS TIME,IO ON;

 SELECT TOP 1000000

        N = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

   INTO #MyHead2

   FROM      sys.all_columns ac1

  CROSS JOIN sys.all_columns ac2

;

    SET STATISTICS TIME,IO OFF;

PRINT REPLICATE('-',119);

GO

DROP TABLE #MyHead1,#MyHead2

GO

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions