SQL Query - join table

Question

Post reply

SQL Query - join table

ktengkie

SSC Enthusiast

Points: 107
More actions
October 3, 2008 at 11:05 pm

#193649

Hi All

I ‘m using SQL Server 2005. I not good in SQL query. I got 2 tables as below

Table A:-

Invoice No Invoice Amount Invoice Date

---------------- --------------------- --------------------

A1234 1000 30/07/2008

B4444 200 11/08/2008

Table B:-

Invoice No Payment No Payment Collected Payment Date

-------------- ------------------ -------------------------- -----------------------

A1234 P111 300 03/08/2008

A1234 P256 400 09/09/2008

B4444 P555 200 31/08/2008

I need the result as below:-

(it's only show the invoice that still have outstanding amount)

Invoice No Invoice Amount outstanding Amount

--------------- -------------------- -------------------------------

A1234 1000 300

(Noted: outstanding Amount = Invoice Amount – Total Payment collected)

how to join these 2 tables in order to get the result that I needed. Please help

Thank

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

David Stubbington Valued Member Points: 71 More actions · Answer 1

I'm not an expert, and I'm certain there are more elegant ways of performing this task but one approach is to use a subquery to total the payments collected

SELECT a.[Invoice no], a.[invoice Amount], (a.[invoice Amount] - (SELECT SUM(b.[payment Collected]) FROM b WHERE b.[Invoice No] = a.[invoice No]))

FROM a

The important bit is to link the two tables in the subquery.

Regards

Dugi SSCoach Points: 17998 More actions · Answer 2

David Stubbington (10/4/2008)

I'm not an expert, and I'm certain there are more elegant ways of performing this task but one approach is to use a subquery to total the payments collected

SELECT a.[Invoice no], a.[invoice Amount], (a.[invoice Amount] - (SELECT SUM(b.[payment Collected]) FROM b WHERE b.[Invoice No] = a.[invoice No]))

FROM a

The important bit is to link the two tables in the subquery.

Regards

Even if you are not an expert you post the simple solution here ...great post David!

:w00t:

============================================================
SELECT YOUR PROBLEM FROM SSC.com WHERE PROBLEM DESCRIPTION =
http://www.sqlservercentral.com/articles/Best+Practices/61537/[/url]

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 3

The correlated sub-query can suffer depending on how big the table is. You might care to try a derived table setup:



SELECT a.[Invoice no], a.[invoice Amount], (a.[invoice Amount] - isnull(b_sum.TotPayment,0))

FROM a

   left outer join

     (SELECT SUM(b.[payment Collected]) Totpayment FROM b ) b_sum

  ON b_sum.[Invoice No] = a.[invoice No])

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Vladan SSC-Insane Points: 22072 More actions · Answer 4

Matt is right, using left join to derived table is better than correlated subquery. Unfotunately he has a typo in his post (ID and group by missing in the inner query), this is how it should be (and I added the condition you mentioned:



SELECT a.[Invoice no], a.[invoice Amount], (a.[invoice Amount] - isnull(b_sum.TotPayment,0)) as outstanding

FROM a

LEFT OUTER JOIN

	(SELECT [Invoice no], SUM(b.[payment Collected]) Totpayment

	FROM b

	GROUP BY [Invoice no]) AS b_sum

		ON b_sum.[Invoice No] = a.[invoice No]

WHERE a.[invoice Amount] - isnull(b_sum.TotPayment,0) > 0

However, this is also untested, so I just hope I got it right 🙂

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 5

Yup - Vladan's right. Missed a few moving parts there.

thanks for the assist!:D

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Dugi SSCoach Points: 17998 More actions · Answer 6

Yeaa all assists are wellcomed... and I can see that these assists are coming so friendly and I can feel that we are big SQL family here!

:w00t:

Thnx for all assists!

Dugi

============================================================
SELECT YOUR PROBLEM FROM SSC.com WHERE PROBLEM DESCRIPTION =
http://www.sqlservercentral.com/articles/Best+Practices/61537/[/url]

bobhovious SSC Enthusiast Points: 188 More actions · Answer 7

Assuming invoice number is a unique key in table A, here is another approach, using a summary query:

select A.[invoice no], max([invoice amt]) as invoice_amt,

max([invoice amt])-sum([payment collected]) as amount_due

from A

join B on B.[invoice no] = A.[invoice no]

group by A.[invoice no]

having max([invoice amt])-sum([payment collected]) <> 0

order by A.[invoice no]

Please let us know if you see any performance differences.

ktengkie SSC Enthusiast Points: 107 More actions · Answer 8

ktengkie

SSC Enthusiast

Points: 107

October 7, 2008 at 12:28 am

#881446

Thank for help 🙂

Christian Buettner-167247 SSChampion Points: 13729 More actions · Answer 9

Another approach would be the following:



SELECT InvoiceNo, Sum(Outstanding)

FROM (

	SELECT InvoiceNo, InvoiceAmount AS Outstanding FROM Invoices

	UNION ALL

	SELECT InvoiceNo, (-1) * CollectedAmount AS Outstanding FROM Payments

)

GROUP BY InvoiceNo

HAVING SUM(Outstanding) <> 0

Theoretically, this should also be performing better than the join approach.

Best Regards,

Chris Büttner

bobhovious SSC Enthusiast Points: 188 More actions · Answer 10

Christian, that is an interesting approach.

Thinking about this, it is also possible to use CTEs to get the totals and then join the CTEs.

I may have to play around with these techniques on some production-sized volumes, check out the differences in query plans, and see if there are any significant differences in performance. If I find something worthwhile to report, I'll post it back here.

The Dixie Flatline SSC Guru Points: 53353 More actions · Answer 11

Okay... so much for theory. I ran the code at the bottom (without DBCC FREEPROCCACHE or DBCC DROPCLEANBUFFERS) and saw no significant IO differences against the base tables (#invoice and #payment) with any approach.

Using a CTE to summarize the payment table before joining to the invoice table was slightly faster than doing a summary query of the base tables joined together. Both did a merge join, but the summarization of the #payment table significantly reduced the volume the join had to handle.

Vlad's join to a derived table was third and the non-join approach was forth. These two approaches took 20-30% longer. Oddly, the display of the execution plan from SQL Server Management Studio suggests that Vlad's approach took no more work than the CTE approach, but the elapsed time was consistently and significantly larger.

I couldn't get Christian's syntax to work, so I used a CTE for the non-join approach. The execution plans revealed that it failed to take advantage of the indexing because the union all forced it to do a sort that was about 70% of the cost.

Obviously, the indexing schemes chosen and the test data influence the results. There are three payments per invoice for 70% of the invoices, and the other 30% have no payments. If the average number of payments per invoice was closer to one, the use of the CTE to summarize the #payment table would not help so much.

It seems that once again the it comes down to (1) understanding your data, (2) testing different approaches, and (3) examining the execution plans to understand what is going on under the covers. It's good to know all of these different approaches are available to solve the same problem. Looking at this got me to think outside my usual box. Thanks!!! 🙂

=============================================

--create and populate temporary tables

/*

CREATE TABLE #invoice

(

InvoiceID int IDENTITY(1,1) NOT NULL,

InvoiceAmt decimal(11,2) NOT NULL,

InvoiceDate datetime NOT NULL,

CONSTRAINT pk_#invoice PRIMARY KEY CLUSTERED (InvoiceID ASC)

)

CREATE TABLE #payment

(

PaymentID int IDENTITY(1,1) NOT NULL,

InvoiceID int not null,

PaymentAmt decimal(11,2) NOT NULL,

PaymentDate datetime NOT NULL,

CONSTRAINT pk_#payment PRIMARY KEY NONCLUSTERED (PaymentID ASC)

)

CREATE CLUSTERED INDEX #payment_cluster ON #payment

(

InvoiceID ASC,

PaymentID ASC

)

insert into #invoice

select case when right(t1.N,1) between 1 and 3 then 1000 when right(t1.N,1) between 4 and 6 then 2000 else 3000 end

,getdate()

from tally t1

cross join tally t2

where t2.N <= 10

;with cteAmts(amt) as

(select 250 union all select 300 union all select 125)

insert into #payment

select invoiceID,amt,getdate()

from #invoice

cross join cteAmts

where right(invoiceID,1) not in (2,3,5)

*/

set nocount on;

declare @start datetime

declare @elapsed int

set statistics time off;

set statistics io off;

print '---------------- Joined Summary Query'

set @start = getdate()

select i.invoiceID,max(invoiceAmt) as invoiceAmt,max(invoiceAmt)-sum(paymentAmt) as amtDue

from #invoice i

join #payment p on p.invoiceID = i.invoiceID

group by i.invoiceID

having max(invoiceAmt)-sum(paymentAmt) <> 0

order by i.invoiceID

set @elapsed = datediff(ms,@start,getdate())

print @elapsed

print ''

print '---------------- CTE summary of #payment'

set @start = getdate()

;with payment (invoiceID,payments) as

(select invoiceID,sum(paymentAmt)

from #payment

group by invoiceID

)

select i.invoiceID,i.invoiceAmt,i.invoiceAmt-p.payments as amtDue

from #invoice i

join payment p on p.invoiceID = i.invoiceID

order by invoiceID

set @elapsed = datediff(ms,@start,getdate())

print @elapsed

print ''

print '---------------- non-JOIN '

set @start = getdate()

;with cteUnion (invoiceID,amt) as

( SELECT InvoiceID, InvoiceAmt

FROM #Invoice

UNION ALL

SELECT InvoiceID, (-1) * paymentAmt

FROM #Payment

)

SELECT InvoiceID, Sum(amt) as Outstanding

FROM cteUnion

GROUP BY InvoiceID

HAVING SUM(amt) <> 0

ORDER BY InvoiceID

set @elapsed = datediff(ms,@start,getdate())

print @elapsed

print ''

print '---------------- join to derived table'

set @start = getdate()

SELECT i.invoiceID, i.invoiceAmt, (i.invoiceAmt - isnull(b_sum.TotPayment,0)) as outstanding

FROM #invoice i

LEFT OUTER JOIN

(SELECT invoiceID, SUM(p.paymentAmt) Totpayment

FROM #payment p

GROUP BY invoiceID) AS b_sum

ON b_sum.invoiceID = i.invoiceID

WHERE i.invoiceAmt - isnull(b_sum.TotPayment,0) > 0

set @elapsed = datediff(ms,@start,getdate())

print @elapsed

print ''

/*

drop table #invoice

drop table #payment

*/

__________________________________________________

Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

Christian Buettner-167247 SSChampion Points: 13729 More actions · Answer 12

Thanks for testing this.

The union all approach is not really thought through.

1. It makes more sense to SUM the payments before joining to the invoices.

This reduces the amount of data that needs to be joined. Example is your second query.

2. Joining the tables requires less resources than concatenating the data. For the concatenation,

the InvoiceID for Payments needs to be "maintained" and scanned again for the aggregation

that takes place afterwards.

Sorry for the bad information provided.

Best Regards,

Chris Büttner