Filter Large table using small one

Question

Post reply

Filter Large table using small one

Viewing 5 posts - 16 through 19 (of 19 total)

You must be logged in to reply to this topic. Login to reply

JonFox SSCrazy Points: 2157 More actions · Answer 1

tmccar (12/3/2011)

...The "first occurrence" will be the lowest value of "Project_" in "Large". I hope that's clear!

Tom

Note: In my code above I'm assuming that you want the first value of Project_ when Project_ is sorted in alpha order ascending. If you're actually using a different rule to determine "lowest value of Project_" then you'll need to alter the query accordingly.

tmccar SSC Eights! Points: 990 More actions · Answer 2

tmccar

SSC Eights!

Points: 990

December 3, 2011 at 9:34 am

#1416315

Yes, that's looking good. The actual file has 51 columns and 117,000 rows but it should be possible to apply the same procedures to it.

JonFox SSCrazy Points: 2157 More actions · Answer 3

tmccar (12/3/2011)

Yes, that's looking good. The actual file has 51 columns and 117,000 rows but it should be possible to apply the same procedures to it.

Yep, the general concept should stay the same. It all boils down to figuring out how to identify the one row you are interested in for each group, and using that in your join predicates.

LutzM SSC Guru Points: 107049 More actions · Answer 4

If I'd only know the expected result based on the sample data I might be able to provide an alternative solution (using ROW_NUMBER() and/or CROSS APPLY)...

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

JonFox SSCrazy Points: 2157 More actions · Answer 5

LutzM (12/3/2011)

If I'd only know the expected result based on the sample data I might be able to provide an alternative solution (using ROW_NUMBER() and/or CROSS APPLY)...

I played with those approaches too, but didn't really find anything that performed better than what I originally posted. Using ROW_NUMBER alone I got almost identical performance results to the MIN()/GROUP BY version, although possibly slower...really hard to tell on the small sample size though. CROSS APPLY seemed to perform much worse, due to it causing multiple scans of the table inside the ITVF. Admittedly I may have been missing a better way to write it though. Here's what I tried using ROW_NUMBER():



SELECT drv.*, Small.Path

FROM (

		SELECT ROW_NUMBER() OVER (PARTITION BY [Drawing Number] ORDER BY Project_) AS RowNum, Large.*

		FROM Large

	 ) AS drv INNER JOIN

	 Small ON Small.[Drawing Number] = drv.[Drawing Number]

WHERE RowNum = 1

Here's what I tried using CROSS APPLY:



SELECT s.Path, l.*

FROM Small AS s CROSS APPLY

	 (

		SELECT TOP 1 Large.*

		FROM Large  -- This results in Large getting scanned once for every distinct value of Drawing Number in Small

		WHERE Large.[Drawing Number] = s.[Drawing Number]

		ORDER BY Project_ ASC

	 ) as l

Anything you would suggest doing differently to get either of those to perform better?