Return all rows where one is a match

Question

Post reply

Return all rows where one is a match

Kelvin Phayre

Ten Centuries

Points: 1358
More actions
October 10, 2008 at 6:03 am

#194286

Simple scenario.

We have order header, order detail.

I want to return all rows from order detail where at least one has a specified product i.e. I want to see what else was ordered when the product was ordered.

So I need the following modified to return all the order lines for an order and not just the product match.

Select oph.order_no, opd.order_line_no, opd.product, opd.quantity

From OrderDetail opd inner join OrderHeader oph on opd.order_no = oph.order_no

Where opd.product=’Bike’

Hope someone understands what I’m trying to do. I know I could do a subquery to return all order_no where the product has been ordered but I don’t want to do it this way.

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply

Lowell SSC Guru Points: 323482 More actions · Answer 1

looks like a simple IN() stamtent would do the job...you want the entire order if one of the porducts is 'Bike':

Select

oph.order_no,

opd.order_line_no,

opd.product,

opd.quantity

From OrderDetail opd

inner join OrderHeader oph

on opd.order_no = oph.order_no

Where where opd.order_no in(SELECT order_no FROM OrderDetail WHERE product=’Bike’)

Lowell

--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

Michael Earl-395764 SSC Guru Points: 53873 More actions · Answer 2

Another solution would be to simply join to the detail again.

[font="Courier New"]Select

oph.order_no

, opd.order_line_no

, opd.product

, opd.quantity

, opd2.product as OtherProduct

, opd2.quantity as OtherQuantity

From

OrderDetail opd

inner join OrderHeader oph on opd.order_no = oph.order_no

inner join OrderDetail opd2 on opd2.order_no = opd.order_no

and opd2.product <> opd.product

Where

opd.product=’Bike’[/font]

Chris Harshman SSC-Forever Points: 42179 More actions · Answer 3

Actually the subquery will probably be more logically correct. I don't know if it is possible for there to be more than one OrderDetail record for the same order with the same product. In Michael's query, it would return all of the detail records for the same order twice if there were two detail records with a product of 'bike'

Gary Johnson-259336 SSCommitted Points: 1673 More actions · Answer 4

I would probably write this as follows. No subquery needed... 🙂



SELECT

    oh.order_no

    , od.order_line_no

    , od.product

    , od.quantity

FROM dbo.OrderHeader oh

    JOIN dbo.OrderDetail od -- Use for full list of products on the order

        ON oh.order_no = od.order_no

    JOIN dbo.OrderDetail odf -- Use to filter data by product

        ON oh.order_no = odf.order_no

WHERE odf.product = @Product

Gary Johnson
Sr Database Engineer

Garadin One Orange Chip Points: 29613 More actions · Answer 5

The IN clause is actually faster than the multiple joins. I tested it on the Northwind database (so it was easily repeatable) as so:

[font="Courier New"]



SELECT

            oh.orderid

            , od.productid

            , od.quantity

    FROM dbo.Orders oh

            JOIN dbo.[Order Details] od -- Use for full list of products on the order

            ON oh.orderid = od.orderid

            JOIN dbo.[Order Details] odf -- Use to filter data by product

            ON oh.orderid = odf.orderid

    WHERE odf.productid = 51

    ORDER BY oh.orderid, od.productid

    ---------------- VS ----------------------

SELECT oh.orderid, od.productid, od.quantity  

    FROM

            orders oh INNER JOIN  [order details] od ON oh.orderid = od.orderid

    WHERE od.orderid IN (

    SELECT orderid

        FROM [order details]

WHERE productid = 51)[/font]

Pretty marked difference on a small dataset. This could potentially change with a large enough dataset and a highly populated IN clause (IE. instead of 39 orders with the targeted product in it, there are 30,000) Depending on how large/densely populated the OP's dataset is, maybe he could test the two for us and let us know.

Seth Phelabaum

Consistency is only a virtue if you're not a screwup. 😉

Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

Perry Whittle SSC Guru Points: 233792 More actions · Answer 6

if using a correlated subquery for checkinga condition\item i prefer EXISTS. I've always been under the impression this should be faster than IN. Am i way off base here?

-----------------------------------------------------------------------------------------------------------

"Ya can't make an omelette without breaking just a few eggs" 😉

Garadin One Orange Chip Points: 29613 More actions · Answer 7

It was my understanding that these subqueries are basically the equivalent of derived tables, as they don't reference the outer rows and thus should not repeat for every row. If I'm mistaken in that regard, then it makes sense that they would significantly degrade in performance as the dataset got larger.

As far as IN and EXISTS, to the best of my knowledge and from everything I've ever read are the same as far as performance goes, assuming they are written the same way. I checked and the execution plan and performance are exactly the same on both in this scenario. That actually makes me doubt my belief that IN is not correlated, as exists does reference the outer table. Maybe they're just normally highly efficient correlated subqueries.

Seth Phelabaum

Consistency is only a virtue if you're not a screwup. 😉

Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 8

Garadin (10/10/2008)

The IN clause is actually faster than the multiple joins. I tested it on the Northwind database (so it was easily repeatable) as so:

I ran your test code and, as expected, no difference. Both execution plans are exactly identical because IN resolves to an inner join.

What kind of difference did you actually come up with?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Chris Harshman SSC-Forever Points: 42179 More actions · Answer 9

Chris Harshman

SSC-Forever

Points: 42179

October 12, 2008 at 9:14 am

#883903

Subqueries with IN and EXISTS serve slightly different purposes. For this situation, an EXISTS subquery would be a correlated subquery, and the IN subquery would not have to be.

Garadin One Orange Chip Points: 29613 More actions · Answer 10

Jeff Moden (10/11/2008)

Garadin (10/10/2008)

The IN clause is actually faster than the multiple joins. I tested it on the Northwind database (so it was easily repeatable) as so:

I ran your test code and, as expected, no difference. Both execution plans are exactly identical because IN resolves to an inner join.

What kind of difference did you actually come up with?

The execution plans are the same, but the percentages skew slightly. The join query has 235 reads as opposed to 156 for the IN. Unfortunately the dataset is too small and both have a duration of 0, so reads was all I could compare on. That said, I have seen queries with more reads actually performing significantly faster, so I suppose using this as the sole measurement criteria isn't really accurate... but doesn't it at least show that they aren't exactly the same?

Seth Phelabaum

Consistency is only a virtue if you're not a screwup. 😉

Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

Garadin One Orange Chip Points: 29613 More actions · Answer 11

Here is what I get when running the three methods. Apologies for the somewhat sloppy code, I wrote this on my laptop while laying on the couch.

Seth Phelabaum

Consistency is only a virtue if you're not a screwup. 😉

Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 12

Garadin (10/12/2008)

The execution plans are the same, but the percentages skew slightly. The join query has 235 reads as opposed to 156 for the IN. Unfortunately the dataset is too small and both have a duration of 0, so reads was all I could compare on. That said, I have seen queries with more reads actually performing significantly faster, so I suppose using this as the sole measurement criteria isn't really accurate... but doesn't it at least show that they aren't exactly the same?

You said "but the percentages skew slightly"... are you talking about the "Percent of Batch" in the execution plan?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Garadin One Orange Chip Points: 29613 More actions · Answer 13

Jeff Moden (10/13/2008)

Garadin (10/12/2008)

The execution plans are the same, but the percentages skew slightly. The join query has 235 reads as opposed to 156 for the IN. Unfortunately the dataset is too small and both have a duration of 0, so reads was all I could compare on. That said, I have seen queries with more reads actually performing significantly faster, so I suppose using this as the sole measurement criteria isn't really accurate... but doesn't it at least show that they aren't exactly the same?

You said "but the percentages skew slightly"... are you talking about the "Percent of Batch" in the execution plan?

Yes.

Seth Phelabaum

Consistency is only a virtue if you're not a screwup. 😉

Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 14

Not a good thing to do... the Percentage of Batch is highly unreliable and frequently absolutely incorrect...

Look at the Percentage of Batch on the following code... then run it... SURPRISE! 🙂



SET NOCOUNT ON

--=======================================================================================

-- Recursive method shown by (Name with-held)

--=======================================================================================

  PRINT '========== Recursive method =========='

--===== Turn on some performance counters ===============================================

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

DECLARE @Bitbucket DATETIME --Holds display output so display times aren't measured.

--===== Execute the code being tested ===================================================

DECLARE @DateVal DATETIME

    SET @DateVal = '2008-01-01'

;with mycte as

     (

       select @DateVal AS DateVal

       union all

       select DateVal + 1

         from    mycte

         where   DateVal + 1 < DATEADD(yy, 5, @DateVal)

     )

select @Bitbucket = d.dateval

from mycte d

OPTION (MAXRECURSION 0)

--===== Turn off the performance counters and print a separator =========================

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

  PRINT REPLICATE('=',90)

GO

--=======================================================================================

-- Tally table method by Jeff Moden

--=======================================================================================

  PRINT '========== Tally table method =========='

--===== Turn on some performance counters ===============================================

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

DECLARE @Bitbucket DATETIME --Holds display output so display times aren't measured.

--===== Execute the code being tested ===================================================

DECLARE @StartDate AS DATETIME

    SET @StartDate = '2008-01-01'

SELECT TOP (DATEDIFF(dd,@StartDate,DATEADD(yy,5,@StartDate)))

        @Bitbucket = @StartDate-1+t.N

   FROM Tally t

  ORDER BY N

--===== Turn off the performance counters and print a separator =========================

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

  PRINT REPLICATE('=',90)

Here's another one... both have identical execution plans and percentages...

SET STATISTICS TIME ON



SELECT TOP 10 *

FROM dbo.JBMTest

ORDER BY SomeLetters2

SET ROWCOUNT 10

SELECT *

FROM dbo.JBMTest

ORDER BY SomeLetters2

SET ROWCOUNT 0

Oh yeah... here's the test table I used for that one...

--===== Create and populate a 1,000,000 row test table.



     -- Column "RowNum" has a range of 1 to 1,000,000 unique numbers

     -- Column "SomeInt" has a range of 1 to 50,000 non-unique numbers

     -- Column "SomeLetters2" has a range of "AA" to "ZZ" non-unique 2 character strings

     -- Column "SomeMoney has a range of 0.0000 to 99.9999 non-unique numbers

     -- Column "SomeDate" has a range of  >=01/01/2000 and <01/01/2010 non-unique date/times

     -- Column "SomeCSV" contains 'Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10'

     --        for all rows.

     -- Column "SomeHex12" contains 12 random hex characters (ie, 0-9,A-F)

     -- Jeff Moden

 SELECT TOP 1000000

        RowNum       = IDENTITY(INT,1,1),

        SomeInt      = ABS(CHECKSUM(NEWID()))%50000+1,

        SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID()))%26+65)

                     + CHAR(ABS(CHECKSUM(NEWID()))%26+65),

        SomeCSV      = CAST('Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' AS VARCHAR(80)),

        SomeMoney    = CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY),

        SomeDate     = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME),

        SomeHex12    = RIGHT(NEWID(),12)

   INTO dbo.JBMTest

   FROM Master.dbo.SysColumns t1

  CROSS JOIN Master.dbo.SysColumns t2 

--===== A table is not properly formed unless a Primary Key has been assigned

  ALTER TABLE dbo.JBMTest

        ADD PRIMARY KEY CLUSTERED (RowNum)

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions