difference between two resultsets

  • Hello. I've got two queries with a minor difference in joins. One query returns 8948285 rows and the other returns 8948973. I would like to return * from all the extra rows in the second query.

    How can I accomplish a result set displaying the rows in the second query that are not in the first? I'm sure there is a quick way to do this and I don't want to play around with SQL all day becauese it's a fairly long running query. I'll continute to play but if anyone can help me out I appreciate it.

    Here are the queries. Note the only difference is in the final Left join to dw_sd.dbo.PeriodBudgets. The second query ommits the e3a4.CostType = pb.CostType condition.

    THanks for any help!

    /***********************

    *

    * Query 1 (8948285 rows)

    *

    ************************/

    Select count(*)

    From dw_dsa.dbo.e3bdjr_a4bidm e3a4

    Left Join dw_dsa.dbo.JobCostDimLookupTbl jcdl on e3a4.Job = jcdl.JOB

    Left Join dw_sd.dbo.Job j on e3a4.Job = j.jobcode

    Left Join dw_sd.dbo.CostCode cc on (e3a4.Job = cc.JobFK and

    e3a4.Costcode = cc.costcodeNK)

    Left Join dw_sd.dbo.calendar pd on pd.[Year and Month and Day]= postdate

    Left Join dw_sd.dbo.calendar cd on cd.[Year and Month and Day]= costdate

    Left Join dw_sd.dbo.calendar jcd on jcd.[Year and Month and Day]= jobcostperioddate

    Left Join dw_sd.dbo.PeriodBudgets pb on ( e3a4.Job = pb.JobCode and

    e3a4.CostCode = pb.CostCode and

    e3a4.CostType = pb.CostType and

    e3a4.JOBCOSTPERIODDATE = pb.JobCostPeriodDate)

    where exists (select * from dw_sd.dbo.job j2

    where e3a4.job = j2.jobcode)

    /***********************

    *

    *Query 2 (8948973 rows)

    *

    ************************/

    Select count(*)

    From dw_dsa.dbo.e3bdjr_a4bidm e3a4

    Left Join dw_dsa.dbo.JobCostDimLookupTbl jcdl on e3a4.Job = jcdl.JOB

    Left Join dw_sd.dbo.Job j on e3a4.Job = j.jobcode

    Left Join dw_sd.dbo.CostCode cc on (e3a4.Job = cc.JobFK and

    e3a4.Costcode = cc.costcodeNK)

    Left Join dw_sd.dbo.calendar pd on pd.[Year and Month and Day]= postdate

    Left Join dw_sd.dbo.calendar cd on cd.[Year and Month and Day]= costdate

    Left Join dw_sd.dbo.calendar jcd on jcd.[Year and Month and Day]= jobcostperioddate

    Left Join dw_sd.dbo.PeriodBudgets pb on ( e3a4.Job = pb.JobCode and

    e3a4.CostCode = pb.CostCode and

    e3a4.JOBCOSTPERIODDATE = pb.JobCostPeriodDate)

    where exists (select * from dw_sd.dbo.job j2

    where e3a4.job = j2.jobcode)

  • Unfortunately there is no minus equivalent from oracle in sql server, which is what you really want.

    The choices I can see for you are:

     

    Use the not exists query. Will take a real long time, I know.

    Export both tables to a text file. Then run a freebie comparison tool like examdiff. this will highlight all the differences for you as long as both text files are in the same order.

    Both methods are going to cost you time, i'm afraid.


    ------------------------------
    The Users are always right - when I'm not wrong!

  • Why r u changing the last join in the 2nd query?

    R u looking for bad data?

  • yes... it's a data audit for a fact table build. Apparantly the "Period Budgets" do not rely on costtype to join with the actual costs, but I won't provide this data until examining where the extra 700 rows come from.

    the select count was a simplification. Here is the full query:

    Select

    ISNULL(DivisionSK,1) as DivisionFK,

    ISNULL(CommunitySK,1) as CommunityFK,

    ISNULL(JobSK,1) as JobFK,

    ISNULL(CostCodeSK,1) as CostCodeFK,

    ISNULL(ProjectManagerSK,1) as ProjectManagerFK,

    ISNULL(pd.Calendar_Key,1) as PostDateFK,

    ISNULL(cd.Calendar_Key,1) as CostDateFK,

    ISNULL(jcd.Calendar_Key,1) as JobCostPeriodFK,

    ISNULL(PeriodBudgetSK,1) as PeriodBudgetFK,

    e3a4.CostType,

    e3a4.Reference as ReferenceNumber,

    e3a4.Batch as BatchNumber,

    e3a4.[Description] as TransactionDescription,

    e3a4.Cost, e3a4.Revenue, e3a4.Quantity,

    ISNULL(e3a4.OriginalBudget,0) as OriginalBudget,

    ISNULL(e3a4.RevisedBudget,0) as RevisedBudget,

    ISNULL(e3a4.OriginalQuantity,0) as OriginalQuantity,

    ISNULL(e3a4.RevisedQuantity,0) as RevisedQuantity

    From dw_dsa.dbo.e3bdjr_a4bidm e3a4

    Left Join dw_dsa.dbo.JobCostDimLookupTbl jcdl on e3a4.Job = jcdl.JOB

    Left Join dw_sd.dbo.Job j on e3a4.Job = j.jobcode

    Left Join dw_sd.dbo.CostCode cc on (e3a4.Job = cc.JobFK and

    e3a4.Costcode = cc.costcodeNK)

    Left Join dw_sd.dbo.calendar pd on pd.[Year and Month and Day]= postdate

    Left Join dw_sd.dbo.calendar cd on cd.[Year and Month and Day]= costdate

    Left Join dw_sd.dbo.calendar jcd on jcd.[Year and Month and Day]= jobcostperioddate

    Left Join dw_sd.dbo.PeriodBudgets pb on ( e3a4.Job = pb.JobCode and

    e3a4.CostCode = pb.CostCode and

    e3a4.JOBCOSTPERIODDATE = pb.JobCostPeriodDate)

    where exists (select * from dw_sd.dbo.job j2

    where e3a4.job = j2.jobcode)

  • I don't wanna hurt anyone's feelings, but the obvious answer seems to lie in that column that you change in the final join. I think you'll have to look at the data there to see which line(s) are offending the results.

    Maybe you'd be able to join only 2-3 tables to do that check instead of all 8. Sure will run faster and won't take as much time to scan... and a not exists might run in a acceptable amount of time in this situation.

  • FC would work,but that would be an enormous text file. I'll start working on it to see where I end up.

  • You can get both the resultsets into 2 table variables ( or temp tables) and do a select from tableA where NOT IN...tableB ( or viceverca)

    ******************
    Dinakar Nethi
    Life is short. Enjoy it.
    ******************

  • how would that work syntatically?

    select col1,col2,...,col n from table A

    where col1 not in (Select col1 from table B)

    and col2 not in (select col2 from table B)

    ...

    and coln not in (select coln from table B)

    ?

    That doesn't really work. What you state is exactly what I want, but I'm having trouble writing the sql to get it.

  • you can prbly use a LEFT OUTER JOIN or a RIGHT OUTER JOIN dpending on how you look at it..to get the nonmatching columns .. check out the Books On Line for the xact syntax and more info.

    ******************
    Dinakar Nethi
    Life is short. Enjoy it.
    ******************

  • Might I suggest you read the whole post, or at least the first question before posting ??

    From dw_dsa.dbo.e3bdjr_a4bidm e3a4

    Left Join dw_dsa.dbo.JobCostDimLookupTbl jcdl on e3a4.Job = jcdl.JOB

    Left Join dw_sd.dbo.Job j on e3a4.Job = j.jobcode

    Left Join dw_sd.dbo.CostCode cc on (e3a4.Job = cc.JobFK and

    e3a4.Costcode = cc.costcodeNK)

    Left Join dw_sd.dbo.calendar pd on pd.[Year and Month and Day]= postdate

    Left Join dw_sd.dbo.calendar cd on cd.[Year and Month and Day]= costdate

    Left Join dw_sd.dbo.calendar jcd on jcd.[Year and Month and Day]= jobcostperioddate

    Left Join dw_sd.dbo.PeriodBudgets pb on ( e3a4.Job = pb.JobCode and

    e3a4.CostCode = pb.CostCode and

    e3a4.JOBCOSTPERIODDATE = pb.JobCostPeriodDate)

    where exists (select * from dw_sd.dbo.job j2

    where e3a4.job = j2.jobcode)

  • well at this point I have the result set for both queries in saved in physical tables. called ct and noCT. I need to get the rows that exist in noCT and not CT. GRRR.. SQL is so, expressive.

  • As it's been suggested you can run a left join or a not exists statement to see which line is missing from the second table. As far as syntaxe goes you obviously know how to do it and since there's no magical solution to your problem, you'll have to work a little sweat to weed out the problem.

  • Bah, I was hoping to be introduced to a magic wand of some sort. Apparantly Oracle's minus function is what I was looking for. Oh well. I'll play and post my final query when I get it right. Thanks for the help everyone.

  • Perhaps something like this :
     
    Select e3a4.<PKCOlumn> --count(*)

    From dw_dsa.dbo.e3bdjr_a4bidm e3a4

    Left Join dw_dsa.dbo.JobCostDimLookupTbl jcdl on e3a4.Job = jcdl.JOB

    Left Join dw_sd.dbo.Job j on e3a4.Job = j.jobcode

    Left Join dw_sd.dbo.CostCode cc on (e3a4.Job = cc.JobFK and

    e3a4.Costcode = cc.costcodeNK)

    Left Join dw_sd.dbo.calendar pd on pd.[Year and Month and Day]= postdate

    Left Join dw_sd.dbo.calendar cd on cd.[Year and Month and Day]= costdate

    Left Join dw_sd.dbo.calendar jcd on jcd.[Year and Month and Day]= jobcostperioddate

    Left Join dw_sd.dbo.PeriodBudgets pb on ( e3a4.Job = pb.JobCode and

    e3a4.CostCode = pb.CostCode and

    e3a4.JOBCOSTPERIODDATE = pb.JobCostPeriodDate)

    where exists (select * from dw_sd.dbo.job j2

    where e3a4.job = j2.jobcode)

     
    AND e3a4.<PKColumn> NOT IN

    (

    Select e3a4.<PKCOlumn> --count(*)

    From dw_dsa.dbo.e3bdjr_a4bidm e3a4

    Left Join dw_dsa.dbo.JobCostDimLookupTbl jcdl on e3a4.Job = jcdl.JOB

    Left Join dw_sd.dbo.Job j on e3a4.Job = j.jobcode

    Left Join dw_sd.dbo.CostCode cc on (e3a4.Job = cc.JobFK and

    e3a4.Costcode = cc.costcodeNK)

    Left Join dw_sd.dbo.calendar pd on pd.[Year and Month and Day]= postdate

    Left Join dw_sd.dbo.calendar cd on cd.[Year and Month and Day]= costdate

    Left Join dw_sd.dbo.calendar jcd on jcd.[Year and Month and Day]= jobcostperioddate

    Left Join dw_sd.dbo.PeriodBudgets pb on ( e3a4.Job = pb.JobCode and

    e3a4.CostCode = pb.CostCode and

    e3a4.CostType = pb.CostType and

    e3a4.JOBCOSTPERIODDATE = pb.JobCostPeriodDate)

    where exists (select * from dw_sd.dbo.job j2

    where e3a4.job = j2.jobcode)

    )

     
    Replace PKColumn with the actual primary key column in the above select statements.

    ******************
    Dinakar Nethi
    Life is short. Enjoy it.
    ******************

  • not in on 8 M records?????????????????????????

    I would strongly suggest left join or not exists on this situation.

Viewing 15 posts - 1 through 15 (of 24 total)

You must be logged in to reply to this topic. Login to reply