top n percent WITH group by?

  • Remi, I've actually been thinking about doing something very similar... interesting approach - thanks!

  • Unless you're really short on disk space, I don't see the problem with putting all the data in a temp table, especially if you'd doing this just once and then dropping it.

    If I understand this problem correctly... I might use an undocumented trick I learned, which is to use a clustered index to order the rows, then use an UPDATE to "rank" them, and divide the ranks by the number of rows in the group:

    SELECT * INTO #t FROM tblStateCensus_top10percent

    ALTER TABLE #t ADD pct FLOAT

    CREATE CLUSTERED INDEX #IX_t ON #t(state, population desc)

    SELECT state, COUNT(*) AS region_count

    INTO #tstates

    GROUP BY state

    DECLARE @rank INT

    DECLARE @state CHAR(2)

    SET @state = ''

    UPDATE #t

    FROM #t JOIN #tstates ON #t.state = #tstate.state

    SET @rank = CASE WHEN @state = #t.state THEN @rank + 1 ELSE 1 END,

     pct = CAST(@rank AS FLOAT) / #tstate.region_count,

     @state = state

    SELECT *

    FROM #t

    WHERE pct < 0.90

    -- or is it WHERE pct < 0.10 ?

  • I also thaught of something similar but I didn't pursue it because I wanted a set based approach. This temp table solution would however be much faster than the set solution I presented (the in clause of the exists will absolutely destroy the performance of this query on a large table).

  • Here's another thaught: the in clause will kill the performance because the whole list will be scanned untill a match is found. Since

    where 1 in (3,1,2)

    is the same as

    where 1 = 3 or 1 = 1 or 1 = 2

    and that sql server uses shortcircuiting, this should improve the performance greatly (can't test this theory here) :

    SELECT O.XType

    -- , count(*) AS TotalHits_Found

    , O.name

    , (SELECT CEILING(COUNT(*) * 0.9) FROM dbo.SysObjects O4 WHERE O4.XType = O.XType) as [90%]

    , (SELECT COUNT(*) FROM dbo.SysObjects O5 WHERE O5.XType = O.XType) AS [100%]

    FROM dbo.SysObjects O

    WHERE EXISTS (SELECT * FROM dbo.SysObjects O2 WHERE O2.XType = O.XType AND O2.id = O.id and O2.id IN (SELECT TOP 90 PERCENT id FROM dbo.SysObjects O3 WHERE O3.XType = O.XType ORDER BY /*send the wanted item at the front of the list*/ CASE WHEN O3.id = O2.id then 0 ELSE 1 END))

    --GROUP BY O.XType

    ORDER BY O.XType

    , O.Name

  • I finally had the opportunity to check my theory.

    1 - it doesn't work, as you ALWAYS move the searched item in the top 90% group which yields false results.

    2 - it doesn't seem to work any faster. I assume that since you the order by case must evaluate all rows and then the in only 1, it takes more than than the in never evaluating all the rows unless no hit is found or the item is at the end of the list. But I don't know enough about the servers internals to be certain of that.

  • Remi,

    interesting thread. Look like a one-man show here. Wasn't there recently a remark on posting multiple times when none or one time is enough? 

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • Yes but this time I was replying to myself as new ideas came to me. And it wasn't pointless chatting like this .

  • I appreciated all the responses. Got a lot out of the answers. Thank you to everyone who took the time to help me out!

    -Heather

  • Have you tried running my statemement on the 8M rows table?.. I was wondering how long it would take to run it (if it doesn't interferre with production).

  • Yes but this time I was replying to myself as new ideas came to me. And it wasn't pointless chatting like this

    Just another definition for the same thing.

     

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • Obviously, but I look better that way .

  • Okay, boys - behave! I know I'm cute, but I won't have you fighting over me. hehe.

  • Hehe... it's not over you we're arguing and as far as I know Frank is married with children (new born included).

    Have you tested my query on a large table... I have nothing to test it on a table that remotely compares to your environement?

  • I know, it was a joke. Totally kidding.

    I'll try that query out today and let you know how it goes.

    -heather

  • Three to be precise here.

    Obviously, but I look better that way .

    Now if I recall this correct, you say in English:

    Beauty lies in the eyes of the beholder.

    So, if you think you look better this way, who am I to argue against

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

Viewing 15 posts - 16 through 30 (of 32 total)

You must be logged in to reply to this topic. Login to reply