using group by and count together.

Question

Post reply

using group by and count together.

BenWard

SSCertifiable

Points: 5903
More actions
October 28, 2010 at 4:11 am

#388112

Hi Guys,

Just a small one this time.

I have a table which contains household insurance quotations.

As is normal with insurance each customer can do multiple quotes for each policy, they change little things like their excess, the number of days of unnocupancy they have etc etc to get a good balance between low premium and good cover.

In order to report on conversion rates (percentage of customers who took out an insurance policy after quoting) we need to count up the number of quotes have been done.

Now if we say that each one of those individual quotes are 1 quote, our conversion rate will be horriffically low. As such we want to group those quotes up so If a customer phones in to get a quote on tuesday, thinks about it, does another quote on wednesday then finally takes out a policy on thursday this is counted as one quote and not 3.

We have a file of 'all' quotes (imaginatively named) tblHhiQuotes

in order to get the desired 'unique' quotes we are grouping the quotes by customer surname, customer postcode and insurance cover type (eg contents, buildings or integrated)

All I want to get is an integer count... I can achieve this in the following manner....

DECLARE @count AS TABLE (one bit)
INSERT INTO @count SELECT 1 FROM tblHhiQuotes WHERE sGlDate <= '28 Oct 2010' AND sEffectiveDate BETWEEN '01 Nov 2010' AND '30 Nov 2010' AND cTransType = 'FQ' GROUP BY vSurname ,cPostcode ,cCoverType --output... SELECT COUNT(*)
FROM @count

this works and returns a value of 2173

Initially I tried select count(*) from..... but this of course returns 2173 rows of small numbers (those being the number of 'duplicates')

I'm happy that my program works and returns a value of 2173 but I'm thinking there MUST be a more efficient way of doing this without resorting to inserting records into a temporary table. I may well be being very very stupid and forgetting some extremely simple function but I cant think of any....

Is there a built in function of count or group by that im missing here?

Thanks!

Ben

^ Thats me!

----------------------------------------
01010111011010000110000101110100 01100001 0110001101101111011011010111000001101100011001010111010001100101 01110100011010010110110101100101 011101110110000101110011011101000110010101110010
----------------------------------------

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply

SwePeso SSC-Dedicated Points: 39725 More actions · Answer 1

SwePeso

SSC-Dedicated

Points: 39725

October 28, 2010 at 4:21 am

#1243432

;WITH ctSource(Data)



AS (

	SELECT		1

	FROM		tblHhiQuotes

	WHERE		sGlDate <= '28 Oct 2010'

			AND sEffectiveDate BETWEEN '01 Nov 2010' AND '30 Nov 2010'

			AND cTransType = 'FQ'

	GROUP BY	vSurname,

			cPostcode,

			cCoverType

)

SELECT	SUM(Data)

FROM cteSource

N 56°04'39.16"
E 12°55'05.25"

BenWard SSCertifiable Points: 5903 More actions · Answer 2

thats brilliant thanks.

I'm surprised there isnt an alternative version of count in sql server that returns the number of rows like a numrows() function or whatever.

Thanks

Ben

^ Thats me!

----------------------------------------
01010111011010000110000101110100 01100001 0110001101101111011011010111000001101100011001010111010001100101 01110100011010010110110101100101 011101110110000101110011011101000110010101110010
----------------------------------------

ChrisM@home SSC-Insane Points: 24260 More actions · Answer 3

BenWard (10/28/2010)

thats brilliant thanks.

I'm surprised there isnt an alternative version of count in sql server that returns the number of rows like a numrows() function or whatever.

Thanks

Like this version?

SELECT COUNT(*)



FROM (

	SELECT 1

	FROM tblHhiQuotes

	WHERE sGlDate <= '28 Oct 2010'

		AND sEffectiveDate BETWEEN

			'01 Nov 2010' AND '30 Nov 2010'

		AND cTransType = 'FQ'

	GROUP BY vSurname

		,cPostcode

		,cCoverType

) d

😉

[font="Arial"]^{Low-hanging fruit picker and defender of the moggies}[/font]

For better assistance in answering your questions, please read this[/url].

Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

BenWard SSCertifiable Points: 5903 More actions · Answer 4

haha yes

which is (I think) exactly the same as

;WITH cteSource(Data)



AS (

        SELECT          1

        FROM            tblHhiQuotes

        WHERE           sGlDate <= '28 Oct 2010'

                        AND sEffectiveDate BETWEEN '01 Nov 2010' AND '30 Nov 2010'

                        AND cTransType = 'FQ'

        GROUP BY        vSurname,

                        cPostcode,

                        cCoverType

)

SELECT  COUNT(*)

FROM    cteSource

isnt it?!?

I'm sure we could spend many many hours attempting to determine if we can save 13 milliseconds of processing time by redoing this that or the other here!

Ben

^ Thats me!

----------------------------------------
01010111011010000110000101110100 01100001 0110001101101111011011010111000001101100011001010111010001100101 01110100011010010110110101100101 011101110110000101110011011101000110010101110010
----------------------------------------

ChrisM@home SSC-Insane Points: 24260 More actions · Answer 5

BenWard (10/28/2010)

haha yes

which is (I think) exactly the same as

;WITH cteSource(Data)
AS ( SELECT 1 FROM tblHhiQuotes WHERE sGlDate <= '28 Oct 2010' AND sEffectiveDate BETWEEN '01 Nov 2010' AND '30 Nov 2010' AND cTransType = 'FQ' GROUP BY vSurname, cPostcode, cCoverType ) SELECT COUNT(*) FROM cteSource

isnt it?!?

Heh yep you're absolutely right, they're functionally equivalent, though I wouldn't go so far as to say they are the same.

I haven't got an instance to play with today - I wonder if any of the extensions to GROUP BY (e.g. WITH ROLLUP) would do the trick in a single statement?

I'm sure we could spend many many hours attempting to determine if we can save 13 milliseconds of processing time by redoing this that or the other here!

Big topic, and current, "Speed Phreakery" or something. Well worth a read.

[font="Arial"]^{Low-hanging fruit picker and defender of the moggies}[/font]

For better assistance in answering your questions, please read this[/url].

Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

BenWard SSCertifiable Points: 5903 More actions · Answer 6

I did try rollup, that just gave extra records.

I'll give that a read...

Ben

^ Thats me!

----------------------------------------
01010111011010000110000101110100 01100001 0110001101101111011011010111000001101100011001010111010001100101 01110100011010010110110101100101 011101110110000101110011011101000110010101110010
----------------------------------------

drew.allen SSC Guru Points: 76776 More actions · Answer 7

BenWard (10/28/2010)

thats brilliant thanks.

I'm surprised there isnt an alternative version of count in sql server that returns the number of rows like a numrows() function or whatever.

Thanks

You mean like Count(DISTINCT vSurname + cPostcode + cCoverType ) ?

You may need to do conversions to get the datatypes to match.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

BenWard SSCertifiable Points: 5903 More actions · Answer 8

BenWard

SSCertifiable

Points: 5903

October 28, 2010 at 8:49 am

#1243594

you

are

a ...

genius!

😀

Ben

^ Thats me!

----------------------------------------
01010111011010000110000101110100 01100001 0110001101101111011011010111000001101100011001010111010001100101 01110100011010010110110101100101 011101110110000101110011011101000110010101110010
----------------------------------------

ChrisM@home SSC-Insane Points: 24260 More actions · Answer 9

Hahaha that's too funny!

I bet it's the same performance too.

Ben, if you're interested and have the time, you could assess all three queries for performance using the simplest method:

SET STATISTICS TIME ON

SET STATISTICS IO ON

<<statements>>

SET STATISTICS TIME OFF

SET STATISTICS IO OFF

Run three or four times per statement and take average, the first run of each statement is likely to be slowest due to cacheing of the subsequent runs.

If you're on a dev box on your own, I think you can use something like DBCC FREEPROCCACHE before each run, which you would then do only once.

Measuring the performance of a statement or batch can get quite involved. The method above is quick and reliable enough to use during production coding to tie-break between functionally-equivalent methods.

[font="Arial"]^{Low-hanging fruit picker and defender of the moggies}[/font]

For better assistance in answering your questions, please read this[/url].

Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

scott.pletcher SSCertifiable Points: 5958 More actions · Answer 10

Or maybe this, since PARTITION BY generally seems to be pretty fast in SQL Server:



;WITH cte1 AS (

SELECT ROW_NUMBER() OVER (PARTITION BY vSurname, cPostcode, cCoverType ORDER BY vSurname)

    AS rownum

FROM @count

)

SELECT COUNT(*)

FROM cte1

WHERE rownum = 1

I'd be afraid that the concatenation may be a bit slow, since string manipulation is definitely a weak point for SQL.

Scott Pletcher, SQL Server MVP 2008-2010

BenWard SSCertifiable Points: 5903 More actions · Answer 11

And the results are in !

My query was the slowest (as was expected lol)

most common speed 156ms with a low of 141 and a high of 196

3rd place goes to Peso whose query most commonly executes in 141ms with a low of 130 and a high of 156

2nd is scott, the partition table can be very fast but is not reliably so, I get even results of either 125ms and 141ms with the occaisional high of 170

1st place goes to drew with a most common time of 125ms, a low of 110 and a high of 125.

here is your prize:

Thanks Guys 🙂

As always, very impressed by the great community spirit here.

Ben

^ Thats me!

----------------------------------------
01010111011010000110000101110100 01100001 0110001101101111011011010111000001101100011001010111010001100101 01110100011010010110110101100101 011101110110000101110011011101000110010101110010
----------------------------------------

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 12

BenWard (10/29/2010)

And the results are in !

My query was the slowest (as was expected lol)

most common speed 156ms with a low of 141 and a high of 196

3rd place goes to Peso whose query most commonly executes in 141ms with a low of 130 and a high of 156

2nd is scott, the partition table can be very fast but is not reliably so, I get even results of either 125ms and 141ms with the occaisional high of 170

1st place goes to drew with a most common time of 125ms, a low of 110 and a high of 125.

Ahem...how did the derived table compare with the CTE? 😉

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

BenWard SSCertifiable Points: 5903 More actions · Answer 13

erm....

Msg 8155, Level 16, State 2, Line 6

No column was specified for column 1 of 'd'.

:Whistling:

Ben

^ Thats me!

----------------------------------------
01010111011010000110000101110100 01100001 0110001101101111011011010111000001101100011001010111010001100101 01110100011010010110110101100101 011101110110000101110011011101000110010101110010
----------------------------------------

SwePeso SSC-Dedicated Points: 39725 More actions · Answer 14

SwePeso

SSC-Dedicated

Points: 39725

October 29, 2010 at 5:38 am

#1244066



SELECT		SUM(1) AS Items

FROM		dbo.tblHhiQuotes

WHERE           sGlDate <= '28 Oct 2010'

		AND sEffectiveDate BETWEEN '01 Nov 2010' AND '30 Nov 2010'

		AND cTransType = 'FQ'

GROUP BY	vSurname,

		cPostcode,

		cCoverType

WITH ROLLUP

HAVING		GROUPING(vSurname) = 1

N 56°04'39.16"
E 12°55'05.25"