December 29, 2016 at 3:23 pm
Hi,
Can some one help to find median for Odd and Even count in one function
Thanks
Grace
December 29, 2016 at 10:33 pm
Yes. First, see my original post on the subject from 2007 so you get the big picture of what was requested so that you'll understand the following code, which was the solution also posted on that thread.
http://qa.sqlservercentral.com/Forums/Topic351991-8-1.aspx
Here's the code to setup the million row test table and the solution I provided, which still works even at this late day. Note that it finds ~70,200 medians across 1 million rows in about 6 seconds (not including the time it takes to build the test data, which takes about 11 seconds).
/**************************************************************************************************
Setup a million row test table.
This is NOT a part of the solution. We're just creating a test table here.
**************************************************************************************************/
--===== If the temp table already exists, drop it
IF OBJECT_ID('TempDB..#Steps') IS NOT NULL
DROP TABLE #Steps
;
--===== Simulate loading your real temp table with ...
-- A MILLION rows, 17576 reps (3 character "names"), a DeltaT from 0 to 1999,
-- and 4 "Steps" (Step 1, Step 2, etc)
SELECT TOP 1000000 IDENTITY(INT,1,1) AS RowNum,
CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*25+65))
+ CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*25+65))
+ CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*25+65)) AS Rep,
CAST(RAND(CAST(NEWID() AS VARBINARY))*2000 AS INT) AS DeltaT,
'Step '
+ CAST(CAST(RAND(CAST(NEWID() AS VARBINARY))*4 AS INT)+1 AS VARCHAR(25)) AS MileCode
INTO #Steps
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2
;
--===== Every table, even temporary ones, should have a Primary Key...
-- Makes all the difference in the world, performance wise, on this script.
ALTER TABLE #Steps
ADD PRIMARY KEY CLUSTERED (RowNum)
;
--===== And, trust me, you want this index. It's a "covering" index for both the
-- ORDER BY's, the JOINs, and the data that will appear in the code that follows.
CREATE INDEX IX_Steps ON #Steps
(Rep,MileCode,DeltaT)
;
/**************************************************************************************************
Find the median for all steps of all reps.
6 seconds to find ~70,200 medians across 1 million rows.
**************************************************************************************************/
SELECT DISTINCT
m.Rep,
m.MileCode,
(
(SELECT TOP 1 DeltaT FROM
(SELECT TOP 50 PERCENT DeltaT FROM #Steps t
WHERE t.Rep = m.Rep AND t.MileCode = m.MileCode
ORDER BY DeltaT ASC
) lo
ORDER BY DeltaT DESC)
+(SELECT TOP 1 DeltaT FROM
(SELECT TOP 50 PERCENT DeltaT FROM #Steps t
WHERE t.Rep = m.Rep AND t.MileCode = m.MileCode
ORDER BY DeltaT DESC
) hi
ORDER BY DeltaT ASC)
) / 2 AS MEDIAN
FROM #Steps m
--WHERE Rep = 'AAB' AND MileCode = 'Step 4' --Uncomment to find the info on just one rep and step
ORDER BY m.Rep,m.MileCode
;
--Jeff Moden
December 30, 2016 at 7:34 am
Here are 2 other methods that seem to be faster.
WITH cteStepsTiles AS(
SELECT *, NTILE(2) OVER(PARTITION BY Rep, MileCode ORDER BY DeltaT) tile
FROM #Steps
)
SELECT
Rep,
MileCode,
Median = CASE WHEN COUNT(CASE WHEN tile = 1 THEN 1 END) > COUNT(CASE WHEN tile = 2 THEN 1 END)
THEN MAX(CASE WHEN tile = 1 THEN DeltaT END)
ELSE (MAX(CASE WHEN tile = 1 THEN DeltaT END) + MIN(CASE WHEN tile = 2 THEN DeltaT END)) / 2. END
FROM cteStepsTiles
GROUP BY Rep, MileCode
ORDER BY Rep, MileCode;
SELECT
Rep,
MileCode,
Median = AVG(1.0 * DeltaT)
FROM
(
SELECT Rep, MileCode, DeltaT,
rn = ROW_NUMBER() OVER(PARTITION BY Rep, MileCode ORDER BY DeltaT),
c = COUNT(*) OVER(PARTITION BY Rep, MileCode)
FROM #Steps AS o
) AS x
WHERE rn IN ((c + 1)/2, (c + 2)/2)
GROUP BY Rep, MileCode
ORDER BY Rep, MileCode;
EDIT: In SQL Server 2012, there was a function added that would simplify the queries looking for the median. I didn't include it because this is a 2008 forum, but I thought it was worth mentioning it.
December 30, 2016 at 7:38 am
Luis Cazares (12/30/2016)
EDIT: In SQL Server 2012, there was a function added that would simplify the queries looking for the median. I didn't include it because this is a 2008 forum, but I thought it was worth mentioning it.
Which function would that be? I forget the name of it but are you talking about the function for the simplification of paging?
--Jeff Moden
December 30, 2016 at 7:42 am
It's PERCENTILE_DISC or PERCENTILE_CONT which use OVER to define window and order.
The median is the first example on the BOL page:
December 30, 2016 at 7:48 am
Luis Cazares (12/30/2016)
It's PERCENTILE_DISC or PERCENTILE_CONT which use OVER to define window and order.The median is the first example on the BOL page:
Very cool. Thanks.
On the two code examples you provided above (and thank you for those... learned a new trick today thanks to you), the first one is, indeed, quite fast and returns in 3 seconds instead of 6 on my humble laptop (i5 processor with 6GB ram).
The second one takes 18 seconds.
--Jeff Moden
December 30, 2016 at 7:54 am
Jeff Moden (12/30/2016)
Luis Cazares (12/30/2016)
It's PERCENTILE_DISC or PERCENTILE_CONT which use OVER to define window and order.The median is the first example on the BOL page:
Very cool. Thanks.
On the two code examples you provided above (and thank you for those... learned a new trick today thanks to you), the first one is, indeed, quite fast and returns in 3 seconds instead of 6 on my humble laptop (i5 processor with 6GB ram).
The second one takes 18 seconds.
Are you sure there's nothing else going on? The times in milliseconds I get for the 3 queries:
2 subqueries: 7551
NTILE: 3008
ROW_NUMBER & COUNT: 3452
The numbers would vary a bit from run to run (obviously) but are consistent.
This was on a SQL Server 2014 running on my laptop Intel Core i5 with 16GB of RAM.
December 30, 2016 at 9:43 am
Yes. I'm running 2008 (non r2) on my laptop and nothing else was going on at the time. The first of your queries took 3 seconds, the second took 18 seconds.
--Jeff Moden
December 30, 2016 at 11:48 am
Thanks, Actually I have single column in table, need to find median on that column, I may have count odd and even, Please help me on this
December 30, 2016 at 12:23 pm
kaladharreddy15 (12/30/2016)
Thanks, Actually I have single column in table, need to find median on that column, I may have count odd and even, Please help me on this
The code that we've posted so far will work with a single column. All the code needs is a bit of modification to meet the requirements of your table.
Speaking of that, you've provided no description of the table, the columns, nor even what needs the Median calculation. If you want something that meets your needs, you have to be able to adequately describe those needs.
With that in mind, please read and heed the first article posted in my signature line below under "Helpful Links". Otherwise this whole thread will turn into a round robin of second guessing littered with partial solutions that don't measure up to your requirements because.... we just don't know what they are at this point. 😉
--Jeff Moden
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply