Filter Large table using small one

  • tmccar (12/3/2011)


    ...The "first occurrence" will be the lowest value of "Project_" in "Large". I hope that's clear!

    Tom

    Note: In my code above I'm assuming that you want the first value of Project_ when Project_ is sorted in alpha order ascending. If you're actually using a different rule to determine "lowest value of Project_" then you'll need to alter the query accordingly.

  • Yes, that's looking good. The actual file has 51 columns and 117,000 rows but it should be possible to apply the same procedures to it.

  • tmccar (12/3/2011)


    Yes, that's looking good. The actual file has 51 columns and 117,000 rows but it should be possible to apply the same procedures to it.

    Yep, the general concept should stay the same. It all boils down to figuring out how to identify the one row you are interested in for each group, and using that in your join predicates.

  • If I'd only know the expected result based on the sample data I might be able to provide an alternative solution (using ROW_NUMBER() and/or CROSS APPLY)...



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • LutzM (12/3/2011)


    If I'd only know the expected result based on the sample data I might be able to provide an alternative solution (using ROW_NUMBER() and/or CROSS APPLY)...

    I played with those approaches too, but didn't really find anything that performed better than what I originally posted. Using ROW_NUMBER alone I got almost identical performance results to the MIN()/GROUP BY version, although possibly slower...really hard to tell on the small sample size though. CROSS APPLY seemed to perform much worse, due to it causing multiple scans of the table inside the ITVF. Admittedly I may have been missing a better way to write it though. Here's what I tried using ROW_NUMBER():

    SELECT drv.*, Small.Path

    FROM (

    SELECT ROW_NUMBER() OVER (PARTITION BY [Drawing Number] ORDER BY Project_) AS RowNum, Large.*

    FROM Large

    ) AS drv INNER JOIN

    Small ON Small.[Drawing Number] = drv.[Drawing Number]

    WHERE RowNum = 1

    Here's what I tried using CROSS APPLY:

    SELECT s.Path, l.*

    FROM Small AS s CROSS APPLY

    (

    SELECT TOP 1 Large.*

    FROM Large -- This results in Large getting scanned once for every distinct value of Drawing Number in Small

    WHERE Large.[Drawing Number] = s.[Drawing Number]

    ORDER BY Project_ ASC

    ) as l

    Anything you would suggest doing differently to get either of those to perform better?

Viewing 5 posts - 16 through 19 (of 19 total)

You must be logged in to reply to this topic. Login to reply