SQL Query - join table

  • Hi All

    I ‘m using SQL Server 2005. I not good in SQL query. I got 2 tables as below

    Table A:-

    Invoice No Invoice Amount Invoice Date

    ---------------- --------------------- --------------------

    A1234 1000 30/07/2008

    B4444 200 11/08/2008

    Table B:-

    Invoice No Payment No Payment Collected Payment Date

    -------------- ------------------ -------------------------- -----------------------

    A1234 P111 300 03/08/2008

    A1234 P256 400 09/09/2008

    B4444 P555 200 31/08/2008

    I need the result as below:-

    (it's only show the invoice that still have outstanding amount)

    Invoice No Invoice Amount outstanding Amount

    --------------- -------------------- -------------------------------

    A1234 1000 300

    (Noted: outstanding Amount = Invoice Amount – Total Payment collected)

    how to join these 2 tables in order to get the result that I needed. Please help

    Thank

  • I'm not an expert, and I'm certain there are more elegant ways of performing this task but one approach is to use a subquery to total the payments collected

    SELECT a.[Invoice no], a.[invoice Amount], (a.[invoice Amount] - (SELECT SUM(b.[payment Collected]) FROM b WHERE b.[Invoice No] = a.[invoice No]))

    FROM a

    The important bit is to link the two tables in the subquery.

    Regards

  • David Stubbington (10/4/2008)


    I'm not an expert, and I'm certain there are more elegant ways of performing this task but one approach is to use a subquery to total the payments collected

    SELECT a.[Invoice no], a.[invoice Amount], (a.[invoice Amount] - (SELECT SUM(b.[payment Collected]) FROM b WHERE b.[Invoice No] = a.[invoice No]))

    FROM a

    The important bit is to link the two tables in the subquery.

    Regards

    Even if you are not an expert you post the simple solution here ...great post David!

    :w00t:

    ============================================================
    SELECT YOUR PROBLEM FROM SSC.com WHERE PROBLEM DESCRIPTION =
    http://www.sqlservercentral.com/articles/Best+Practices/61537/[/url]

  • The correlated sub-query can suffer depending on how big the table is. You might care to try a derived table setup:

    SELECT a.[Invoice no], a.[invoice Amount], (a.[invoice Amount] - isnull(b_sum.TotPayment,0))

    FROM a

    left outer join

    (SELECT SUM(b.[payment Collected]) Totpayment FROM b ) b_sum

    ON b_sum.[Invoice No] = a.[invoice No])

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Matt is right, using left join to derived table is better than correlated subquery. Unfotunately he has a typo in his post (ID and group by missing in the inner query), this is how it should be (and I added the condition you mentioned:

    SELECT a.[Invoice no], a.[invoice Amount], (a.[invoice Amount] - isnull(b_sum.TotPayment,0)) as outstanding

    FROM a

    LEFT OUTER JOIN

    (SELECT [Invoice no], SUM(b.[payment Collected]) Totpayment

    FROM b

    GROUP BY [Invoice no]) AS b_sum

    ON b_sum.[Invoice No] = a.[invoice No]

    WHERE a.[invoice Amount] - isnull(b_sum.TotPayment,0) > 0

    However, this is also untested, so I just hope I got it right 🙂

  • Yup - Vladan's right. Missed a few moving parts there.

    thanks for the assist!:D

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Yeaa all assists are wellcomed... and I can see that these assists are coming so friendly and I can feel that we are big SQL family here!

    :w00t:

    Thnx for all assists!

    Dugi

    ============================================================
    SELECT YOUR PROBLEM FROM SSC.com WHERE PROBLEM DESCRIPTION =
    http://www.sqlservercentral.com/articles/Best+Practices/61537/[/url]

  • Assuming invoice number is a unique key in table A, here is another approach, using a summary query:

    select A.[invoice no], max([invoice amt]) as invoice_amt,

    max([invoice amt])-sum([payment collected]) as amount_due

    from A

    join B on B.[invoice no] = A.[invoice no]

    group by A.[invoice no]

    having max([invoice amt])-sum([payment collected]) <> 0

    order by A.[invoice no]

    Please let us know if you see any performance differences.

  • Thank for help 🙂

  • Another approach would be the following:

    SELECT InvoiceNo, Sum(Outstanding)

    FROM (

    SELECT InvoiceNo, InvoiceAmount AS Outstanding FROM Invoices

    UNION ALL

    SELECT InvoiceNo, (-1) * CollectedAmount AS Outstanding FROM Payments

    )

    GROUP BY InvoiceNo

    HAVING SUM(Outstanding) <> 0

    Theoretically, this should also be performing better than the join approach.

    Best Regards,

    Chris Büttner

  • Christian, that is an interesting approach.

    Thinking about this, it is also possible to use CTEs to get the totals and then join the CTEs.

    I may have to play around with these techniques on some production-sized volumes, check out the differences in query plans, and see if there are any significant differences in performance. If I find something worthwhile to report, I'll post it back here.

  • Okay... so much for theory. I ran the code at the bottom (without DBCC FREEPROCCACHE or DBCC DROPCLEANBUFFERS) and saw no significant IO differences against the base tables (#invoice and #payment) with any approach.

    Using a CTE to summarize the payment table before joining to the invoice table was slightly faster than doing a summary query of the base tables joined together. Both did a merge join, but the summarization of the #payment table significantly reduced the volume the join had to handle.

    Vlad's join to a derived table was third and the non-join approach was forth. These two approaches took 20-30% longer. Oddly, the display of the execution plan from SQL Server Management Studio suggests that Vlad's approach took no more work than the CTE approach, but the elapsed time was consistently and significantly larger.

    I couldn't get Christian's syntax to work, so I used a CTE for the non-join approach. The execution plans revealed that it failed to take advantage of the indexing because the union all forced it to do a sort that was about 70% of the cost.

    Obviously, the indexing schemes chosen and the test data influence the results. There are three payments per invoice for 70% of the invoices, and the other 30% have no payments. If the average number of payments per invoice was closer to one, the use of the CTE to summarize the #payment table would not help so much.

    It seems that once again the it comes down to (1) understanding your data, (2) testing different approaches, and (3) examining the execution plans to understand what is going on under the covers. It's good to know all of these different approaches are available to solve the same problem. Looking at this got me to think outside my usual box. Thanks!!! 🙂

    =============================================

    --create and populate temporary tables

    /*

    CREATE TABLE #invoice

    (

    InvoiceID int IDENTITY(1,1) NOT NULL,

    InvoiceAmt decimal(11,2) NOT NULL,

    InvoiceDate datetime NOT NULL,

    CONSTRAINT pk_#invoice PRIMARY KEY CLUSTERED (InvoiceID ASC)

    )

    CREATE TABLE #payment

    (

    PaymentID int IDENTITY(1,1) NOT NULL,

    InvoiceID int not null,

    PaymentAmt decimal(11,2) NOT NULL,

    PaymentDate datetime NOT NULL,

    CONSTRAINT pk_#payment PRIMARY KEY NONCLUSTERED (PaymentID ASC)

    )

    CREATE CLUSTERED INDEX #payment_cluster ON #payment

    (

    InvoiceID ASC,

    PaymentID ASC

    )

    insert into #invoice

    select case when right(t1.N,1) between 1 and 3 then 1000 when right(t1.N,1) between 4 and 6 then 2000 else 3000 end

    ,getdate()

    from tally t1

    cross join tally t2

    where t2.N <= 10

    ;with cteAmts(amt) as

    (select 250 union all select 300 union all select 125)

    insert into #payment

    select invoiceID,amt,getdate()

    from #invoice

    cross join cteAmts

    where right(invoiceID,1) not in (2,3,5)

    */

    set nocount on;

    declare @start datetime

    declare @elapsed int

    set statistics time off;

    set statistics io off;

    print '---------------- Joined Summary Query'

    set @start = getdate()

    select i.invoiceID,max(invoiceAmt) as invoiceAmt,max(invoiceAmt)-sum(paymentAmt) as amtDue

    from #invoice i

    join #payment p on p.invoiceID = i.invoiceID

    group by i.invoiceID

    having max(invoiceAmt)-sum(paymentAmt) <> 0

    order by i.invoiceID

    set @elapsed = datediff(ms,@start,getdate())

    print @elapsed

    print ''

    print ''

    print ''

    print '---------------- CTE summary of #payment'

    set @start = getdate()

    ;with payment (invoiceID,payments) as

    (select invoiceID,sum(paymentAmt)

    from #payment

    group by invoiceID

    )

    select i.invoiceID,i.invoiceAmt,i.invoiceAmt-p.payments as amtDue

    from #invoice i

    join payment p on p.invoiceID = i.invoiceID

    order by invoiceID

    set @elapsed = datediff(ms,@start,getdate())

    print @elapsed

    print ''

    print ''

    print ''

    print '---------------- non-JOIN '

    set @start = getdate()

    ;with cteUnion (invoiceID,amt) as

    ( SELECT InvoiceID, InvoiceAmt

    FROM #Invoice

    UNION ALL

    SELECT InvoiceID, (-1) * paymentAmt

    FROM #Payment

    )

    SELECT InvoiceID, Sum(amt) as Outstanding

    FROM cteUnion

    GROUP BY InvoiceID

    HAVING SUM(amt) <> 0

    ORDER BY InvoiceID

    set @elapsed = datediff(ms,@start,getdate())

    print @elapsed

    print ''

    print ''

    print ''

    print '---------------- join to derived table'

    set @start = getdate()

    SELECT i.invoiceID, i.invoiceAmt, (i.invoiceAmt - isnull(b_sum.TotPayment,0)) as outstanding

    FROM #invoice i

    LEFT OUTER JOIN

    (SELECT invoiceID, SUM(p.paymentAmt) Totpayment

    FROM #payment p

    GROUP BY invoiceID) AS b_sum

    ON b_sum.invoiceID = i.invoiceID

    WHERE i.invoiceAmt - isnull(b_sum.TotPayment,0) > 0

    set @elapsed = datediff(ms,@start,getdate())

    print @elapsed

    print ''

    print ''

    print ''

    /*

    drop table #invoice

    drop table #payment

    */

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Thanks for testing this.

    The union all approach is not really thought through.

    1. It makes more sense to SUM the payments before joining to the invoices.

    This reduces the amount of data that needs to be joined. Example is your second query.

    2. Joining the tables requires less resources than concatenating the data. For the concatenation,

    the InvoiceID for Payments needs to be "maintained" and scanned again for the aggregation

    that takes place afterwards.

    Sorry for the bad information provided.

    Best Regards,

    Chris Büttner

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply