Remove duplicates based on occurrences with join on other table

  • Ok I did that...that deletes the data from prospectsincampaigns for the specific campaign, good!

    However *looks afraid to ask*, it still deletes all data from prospects that are NOT in the current in the campaign.

    If I specifiy a campaign I need the duplicates that are removed from prospects to be only the ones that also are in that specific campaign? How do I do that?

  • petervdkerk (8/1/2011)


    Ok I did that...that deletes the data from prospectsincampaigns for the specific campaign, good!

    However *looks afraid to ask*, it still deletes all data from prospects that are NOT in the current in the campaign.

    If I specifiy a campaign I need the duplicates that are removed from prospects to be only the ones that also are in that specific campaign? How do I do that?

    Never be afraid to ask. Just try to learn from the answers so that you don't have to ask the same question again (at least not too ofeten - learning first time is not always easy).

    It's late at night here, and I'm too tired to check this thoroughly or even to understand it properly, but as I see it there are two possible ways of doing this. Both involve modifying the same block of code, which is the block reading

    --delete redundant prospects rows

    delete p from prospects p

    where not exists (select * from ProspectsInCampaigns where prospectid = id)

    and exists (select * from ProspectsInCampaigns C inner join prospects p2

    on p2.id = c.prospectid where p2.companyname = p.companyname)

    I'm not sure exactly which of the two possibilities actually matches the requirement: the two possibilities are (1) delete that block of code altogether, and (2) instead of deleting it make it less eager to delete by adding "and c.campaignid = 4" (or whatever the campaignid you are dealing with is) to the predicate for the exists condition. If neither works, let me know. From your wording I think iyou probably want possibility (1), but not 100% sure.

    Tom

  • Ok, hopefully you had a good night sleep, because Im back 🙂

    This is where I start from:

    START

    id companyname

    1355132 Smeets M.B. Zuid-Limburg BV

    1355133 F.J. ten Berge BV

    1355134 F.J. ten Berge BV

    1355135 MacDonalds

    1355136 MacDonalds

    1355137 Duran BV

    1355138 Duran BV

    1355139 Duran BV

    1355140 Company in No campaign

    prospectid campaignid result

    1355132 1 0

    1355133 1 0

    1355134 1 0

    1355135 5 0

    1355135 6 0

    1355136 5 0

    1355136 6 0

    1355137 1 0

    1355138 1 0

    1355139 4 0

    1355139 4 8

    1355139 4 14

    1355139 6 15

    Now I executed this statement (narrowing the delete statement):

    -- work out which non-min prospectsincampaigns rows should be deleted

    select c.prospectid, c.campaignid into #deletions

    from dbo.ProspectsInCampaigns c, dbo.prospects p

    WHERE result = 0 and p.id = c.prospectid

    and exists (select * from dbo.ProspectsInCampaigns c2, dbo.prospects p2

    where p.companyname = p2.companyname and c2.campaignid = c.campaignid

    and c2.prospectID = p2.ID and p.id > p2.ID)

    -- delete the propspectsincampaigns rows

    delete c from dbo.ProspectsInCampaigns c where exists (select * from #deletions d

    where d.prospectid = c.prospectid and d.campaignid = c.campaignid)

    --delete redundant min prospectsincampaigns rows

    delete c from ProspectsInCampaigns c, Prospects p

    where result = 0 and p.id = c.prospectid

    and exists (select * from ProspectsInCampaigns c2, Prospects p2

    where c2.prospectID = p2.id and c2.campaignid = c.campaignID

    and p2.companyname = p.companyname and c2.prospectID > c.prospectID)

    --delete the prospects rows matching deleted prospectsincampaigns rows

    delete p from dbo.Prospects p where exists (select * from #deletions d

    where d.prospectID = p.id)

    --delete redundant prospects rows

    delete p from prospects p

    where not exists (select * from ProspectsInCampaigns where prospectid = id and campaignid = 5)

    and exists (select * from ProspectsInCampaigns C inner join prospects p2

    on p2.id = c.prospectid where p2.companyname = p.companyname and c.campaignid = 5)

    -- get rid of duplicate entries

    ;with findDups as

    (select *, row_number() over (order by prospectID,campaignid,result) as N

    from ProspectsInCampaigns)

    delete f from findDups f join findDups f2

    on f.prospectID=f2.prospectID and f.campaignid = f2.campaignid and f.campaignid = 5

    where f.N < f2.N and f.result = 0

    -- throw away temporary data

    drop table #deletions

    result is:

    id companyname

    1355132 Smeets M.B. Zuid-Limburg BV

    1355133 F.J. ten Berge BV

    1355135 MacDonalds

    1355137 Duran BV

    1355139 Duran BV

    1355140 Company in No campaign

    prospectid campaignid result

    1355132 1 0

    1355133 1 0

    1355135 5 0

    1355135 6 0

    1355137 1 0

    1355139 4 0

    1355139 4 8

    1355139 4 14

    1355139 6 15

    So you see, even though I (think I) narrowed it to canmpaignid=5, still the duplicate records from campaign 1 are removed from prospects, while these should actually remain intact.

    Im talking about these:

    id companyname

    1355132 Smeets M.B. Zuid-Limburg BV

    1355133 F.J. ten Berge BV

    1355134 F.J. ten Berge BV

    prospectid campaignid result

    1355132 1 0

    1355133 1 0

    1355134 1 0

    So, I also tried to remove the delete statement completely, like so:

    -- work out which non-min prospectsincampaigns rows should be deleted

    select c.prospectid, c.campaignid into #deletions

    from dbo.ProspectsInCampaigns c, dbo.prospects p

    WHERE result = 0 and p.id = c.prospectid

    and exists (select * from dbo.ProspectsInCampaigns c2, dbo.prospects p2

    where p.companyname = p2.companyname and c2.campaignid = c.campaignid

    and c2.prospectID = p2.ID and p.id > p2.ID)

    -- delete the propspectsincampaigns rows

    delete c from dbo.ProspectsInCampaigns c where exists (select * from #deletions d

    where d.prospectid = c.prospectid and d.campaignid = c.campaignid)

    --delete redundant min prospectsincampaigns rows

    delete c from ProspectsInCampaigns c, Prospects p

    where result = 0 and p.id = c.prospectid

    and exists (select * from ProspectsInCampaigns c2, Prospects p2

    where c2.prospectID = p2.id and c2.campaignid = c.campaignID

    and p2.companyname = p.companyname and c2.prospectID > c.prospectID)

    --delete the prospects rows matching deleted prospectsincampaigns rows

    delete p from dbo.Prospects p where exists (select * from #deletions d

    where d.prospectID = p.id)

    -- get rid of duplicate entries

    ;with findDups as

    (select *, row_number() over (order by prospectID,campaignid,result) as N

    from ProspectsInCampaigns)

    delete f from findDups f join findDups f2

    on f.prospectID=f2.prospectID and f.campaignid = f2.campaignid and f.campaignid = 5

    where f.N < f2.N and f.result = 0

    -- throw away temporary data

    drop table #deletions

    But that still deletes duplicate prospects that are in another campaign than campaign 5 in this case.

  • petervdkerk (8/2/2011)


    So you see, even though I (think I) narrowed it to canmpaignid=5, still the duplicate records from campaign 1 are removed from prospects, while these should actually remain intact.

    I think you misunderstood me: what I suggested was meant as an addition to the narrowing you had already done, not a substitute for it.

    Tom

  • I'm not sure what you mean there.

    If you talk about the fact that I substituted campaignid=4 with campaignid=5, thats only because I was testing the code on another campaign. If this substitution is NOT what you mean, in that case I dont know what you mean I should be adding instead of substituting.

  • petervdkerk (8/3/2011)


    I'm not sure what you mean there.

    If you talk about the fact that I substituted campaignid=4 with campaignid=5, thats only because I was testing the code on another campaign. If this substitution is NOT what you mean, in that case I dont know what you mean I should be adding instead of substituting.

    On Saturday you said

    One more thing: Im now also trying to narrow it all down to a single campaign (4 in this case), so I added " and c.campaignid=4" on several lines, but that doesnt seem to work...what should I add to achieve that?

    and the code you posted then has "and c.campaignid = 4" in 3 places.

    Your latest code should have had "and c.campaign = 5" in those three places, but it doesn't - it has nothing there. Adding to those 3 narrowings, there should have been the extra thing I suggested - but you have only the extra thing. In effect you had substituted one narrowing clause for the original three, instead of adding that one to the original three to end up with four.

    Tom

  • Ok, maybe Im not understanding you correctly, but in my last post I posted the 2 different options you gave me and the way I interpreted them.

    So there are 2(!) different statement and in the top statement I have 3 times: "and campaignid=5".

    Or is that not what you meant?

    Thanks again! 🙂

  • petervdkerk (8/3/2011)


    Ok, maybe Im not understanding you correctly, but in my last post I posted the 2 different options you gave me and the way I interpreted them.

    So there are 2(!) different statement and in the top statement I have 3 times: "and campaignid=5".

    Or is that not what you meant?

    Thanks again! 🙂

    OK, the code in your last post begin like this:

    select c.prospectid, c.campaignid into #deletions

    from dbo.ProspectsInCampaigns c, dbo.prospects p

    WHERE result = 0 and p.id = c.prospectid

    and exists (select * from dbo.ProspectsInCampaigns c2, dbo.prospects p2

    where p.companyname = p2.companyname and c2.campaignid = c.campaignid

    and c2.prospectID = p2.ID and p.id > p2.ID)

    That had "and c.campaignid = 4" in it in your earlier post, but doesn't have "and c.campaignid = 5" now. It should have that.

    Then it goes

    -- delete the propspectsincampaigns rows

    delete c from dbo.ProspectsInCampaigns c where exists (select * from #deletions d

    where d.prospectid = c.prospectid and d.campaignid = c.campaignid)

    --delete redundant min prospectsincampaigns rows

    delete c from ProspectsInCampaigns c, Prospects p

    where result = 0 and p.id = c.prospectid

    and exists (select * from ProspectsInCampaigns c2, Prospects p2

    where c2.prospectID = p2.id and c2.campaignid = c.campaignID

    and p2.companyname = p.companyname and c2.prospectID > c.prospectID)

    when you posted on Saturday, you had "and c.campaignid = 4" on both those delete statements, but you now have "and c.campaignid = 5" on neither. It's probably only needed on the second one, but it would be sensible to put it on the first one too since that's what you had before.

    The rest of the code you last posted is

    --delete redundant prospects rows

    delete p from prospects p

    where not exists (select * from ProspectsInCampaigns where prospectid = id and campaignid = 5)

    and exists (select * from ProspectsInCampaigns C inner join prospects p2

    on p2.id = c.prospectid where p2.companyname = p.companyname and c.campaignid = 5)

    -- get rid of duplicate entries

    ;with findDups as

    (select *, row_number() over (order by prospectID,campaignid,result) as N

    from ProspectsInCampaigns)

    delete f from findDups f join findDups f2

    on f.prospectID=f2.prospectID and f.campaignid = f2.campaignid and f.campaignid = 5

    where f.N < f2.N and f.result = 0

    -- throw away temporary data

    drop table #deletions

    which ought to work (I miscounted the occurrences of campaigneid=5 in my last reply, which may have confused you.)

    Tom

  • I ended up with this statement:

    -- work out which non-min prospectsincampaigns rows should be deleted

    select c.prospectid, c.campaignid into #deletions

    from dbo.ProspectsInCampaigns c, dbo.prospects p

    WHERE result = 0 and p.id = c.prospectid

    and exists (select * from dbo.ProspectsInCampaigns c2, dbo.prospects p2

    where p.companyname = p2.companyname and c2.campaignid = c.campaignid and c.campaignid=5

    and c2.prospectID = p2.ID and p.id > p2.ID)

    -- delete the propspectsincampaigns rows

    delete c from dbo.ProspectsInCampaigns c where exists (select * from #deletions d

    where d.prospectid = c.prospectid and d.campaignid = c.campaignid and c.campaignid=5)

    --delete redundant min prospectsincampaigns rows

    delete c from ProspectsInCampaigns c, Prospects p

    where result = 0 and p.id = c.prospectid

    and exists (select * from ProspectsInCampaigns c2, Prospects p2

    where c2.prospectID = p2.id and c2.campaignid = c.campaignID and c.campaignid=5

    and p2.companyname = p.companyname and c2.prospectID > c.prospectID)

    --delete the prospects rows matching deleted prospectsincampaigns rows

    delete p from dbo.Prospects p where exists (select * from #deletions d

    where d.prospectID = p.id)

    --delete redundant prospects rows

    delete p from prospects p

    where not exists (select * from ProspectsInCampaigns where prospectid = id and campaignid = 5)

    and exists (select * from ProspectsInCampaigns C inner join prospects p2

    on p2.id = c.prospectid where p2.companyname = p.companyname and c.campaignid = 5)

    -- get rid of duplicate entries

    ;with findDups as

    (select *, row_number() over (order by prospectID,campaignid,result) as N

    from ProspectsInCampaigns)

    delete f from findDups f join findDups f2

    on f.prospectID=f2.prospectID and f.campaignid = f2.campaignid and f.campaignid = 5

    where f.N < f2.N and f.result = 0

    -- throw away temporary data

    drop table #deletions

    Which almost does what I need.

    The final problem now is that the duplicates removed from prospects could still be in another campaign.

    In my case, when deleting duplicates from campaign 5, those duplicates are still in campaign 1. So the prospectid in prospectsincampaigns with campaignid 1 now points to a non-existing record in prospects.

    How can I prevent that? Seems to me an additional check is needed that checks if the deleted dupes do not occur in another campaign than the campaign that is currently being handled, campaign 5 in my case.

    And just out of curiosity: are you an SSC employee or just a (really really) active SQL fanatic (or both ;). Anyways: thanks again for your time and help!

  • petervdkerk (8/3/2011)


    Which almost does what I need.

    The final problem now is that the duplicates removed from prospects could still be in another campaign.

    In my case, when deleting duplicates from campaign 5, those duplicates are still in campaign 1. So the prospectid in prospectsincampaigns with campaignid 1 now points to a non-existing record in prospects.

    How can I prevent that? Seems to me an additional check is needed that checks if the deleted dupes do not occur in another campaign than the campaign that is currently being handled, campaign 5 in my case.

    I should have thought of that - my excuse is that since you introduced the idea of restricting to one campaign, between trying to deal with phone company problems, hospital visits to unwell friends, an invalid wife, and various other little issues I've been a bit distracted and nothad time to do proper testing.

    The two statements that delete prospects are

    --delete the prospects rows matching deleted prospectsincampaigns rows

    delete p from dbo.Prospects p where exists (select * from #deletions d

    where d.prospectID = p.id)

    --delete redundant prospects rows

    delete p from prospects p

    where not exists (select * from ProspectsInCampaigns where prospectid = id and campaignid = 5)

    and exists (select * from ProspectsInCampaigns C inner join prospects p2

    on p2.id = c.prospectid where p2.companyname = p.companyname and c.campaignid = 5)

    Each needs a change, so that they become

    --delete the prospects rows matching deleted prospectsincampaigns rows

    delete p from dbo.Prospects p where exists (select * from #deletions d

    where d.prospectID = p.id

    and not exists (select * from prospectsincampaigns C where c.prospectid = p.id)

    --delete redundant prospects rows

    delete p from prospects p

    where not exists (select * from ProspectsInCampaigns where prospectid = id and campaignid <> 5)

    and exists (select * from ProspectsInCampaigns C inner join prospects p2

    on p2.id = c.prospectid where p2.companyname = p.companyname and c.campaignid = 5)

    And just out of curiosity: are you an SSC employee or just a (really really) active SQL fanatic (or both ;). Anyways: thanks again for your time and help!

    The only SSC employee, as far as I know, is Steve Jones, who is employed by Redgate Software to run the SSC wesite and provide editorial material (and do other things for the company). Everyone else is either an SQL enthusiast who provides help voluntarily, or someone looking for help (that's how most people start out). Because several real experts are involved we all learn quite a bit too. Some of us also write articles for the site (the going rate is $25 per article, which gives Redgate exclusive rights to the article for 3 months and non-exclusive rights perpetually, if I recall correctly). I contribute a lot less, and help a lot fewer people, than the main contributors to the site (who are volunteers, not Redgate employees).

    Tom

  • You call that little issues? My respect to you sir, reminds me once again my life is actually pretty easy!

    Ok, with the prospects and prospectsincampaigns tables, you can use the following statement to generate the exact testdata I'm using right now:

    declare @newid int

    INSERT INTO prospects(companyname,companyaddress,companyhousenr,companyhousenradd,companyzipcode,companycity,companyphone,cp_sex,cp_firstname,cp_middlename,cp_lastname,cp_email,cp_phone,telco_mob_connections,telco_mob_provider,telco_mob_enddate)

    VALUES ('Smeets M.B. Zuid-Limburg BV','Handelsstraat','2','','6433 KB','Heerlen (MAASTRICHT)','0464209900','1','','','Laumans','n.laumans@smeets-mb.nl','','15','T-Mobile','2/1/2011')

    set @newid=SCOPE_IDENTITY()

    INSERT INTO ProspectsInCampaigns (prospectid,campaignid,result,personalcallbackid,callbackdate,appointmentdate,notes,updatedate,updatedby,recordlastrequestedby,exportdate)

    SELECT id,1 as campaignid,0 as result,'' as personalcallbackid,'1/1/1900' as callbackdate,'12/10/10 16:00' as appointmentdate,'' as notes,'1/1/1900' as updatedate,'' as updatedby,'' as recordlastrequestedby, '1/1/1900' as exportdate FROM prospects WHERE id=@newid

    INSERT INTO prospects(companyname,companyaddress,companyhousenr,companyhousenradd,companyzipcode,companycity,companyphone,cp_sex,cp_firstname,cp_middlename,cp_lastname,cp_email,cp_phone,telco_mob_connections,telco_mob_provider,telco_mob_enddate)

    VALUES ('F.J. ten Berge BV','Wattstraat','5','','7461AB','RIJSSEN','0548512525','1','','','Nijkamp','','','18','Telfort','4/10/2011')

    set @newid=SCOPE_IDENTITY()

    INSERT INTO ProspectsInCampaigns (prospectid,campaignid,result,personalcallbackid,callbackdate,appointmentdate,notes,updatedate,updatedby,recordlastrequestedby,exportdate)

    SELECT id,1 as campaignid,0 as result,'' as personalcallbackid,'1/1/1900' as callbackdate,'12/10/10 16:00' as appointmentdate,'' as notes,'1/1/1900' as updatedate,'' as updatedby,'' as recordlastrequestedby, '1/1/1900' as exportdate FROM prospects WHERE id=@newid

    INSERT INTO prospects(companyname,companyaddress,companyhousenr,companyhousenradd,companyzipcode,companycity,companyphone,cp_sex,cp_firstname,cp_middlename,cp_lastname,cp_email,cp_phone,telco_mob_connections,telco_mob_provider,telco_mob_enddate)

    VALUES ('F.J. ten Berge BV','Wattstraat','5','B','7461AB','RIJSSEN','0548512525','1','','','Nijkamp','mnijkamp@tenbergerijssen.nl','','18','Telfort','4/10/2011')

    set @newid=SCOPE_IDENTITY()

    INSERT INTO ProspectsInCampaigns (prospectid,campaignid,result,personalcallbackid,callbackdate,appointmentdate,notes,updatedate,updatedby,recordlastrequestedby,exportdate)

    SELECT id,1 as campaignid,0 as result,'' as personalcallbackid,'1/1/1900' as callbackdate,'12/10/10 16:00' as appointmentdate,'' as notes,'1/1/1900' as updatedate,'' as updatedby,'' as recordlastrequestedby, '1/1/1900' as exportdate FROM prospects WHERE id=@newid

    INSERT INTO prospects(companyname,companyaddress,companyhousenr,companyhousenradd,companyzipcode,companycity,companyphone,cp_sex,cp_firstname,cp_middlename,cp_lastname,cp_email,cp_phone,telco_mob_connections,telco_mob_provider,telco_mob_enddate)

    VALUES ('MacDonalds','Rozsemarylane','1 B','','AG','','','-1','Markos','','markos@macdonalds.nl','','','','T-Mobile','')

    set @newid=SCOPE_IDENTITY()

    INSERT INTO ProspectsInCampaigns (prospectid,campaignid,result,personalcallbackid,callbackdate,appointmentdate,notes,updatedate,updatedby,recordlastrequestedby,exportdate)

    SELECT id,5 as campaignid,0 as result,'' as personalcallbackid,'1/1/1900' as callbackdate,'1/1/1900' as appointmentdate,'' as notes,'1/1/1900' as updatedate,'' as updatedby,'' as recordlastrequestedby, '1/1/1900' as exportdate FROM prospects WHERE id=@newid

    INSERT INTO ProspectsInCampaigns (prospectid,campaignid,result,personalcallbackid,callbackdate,appointmentdate,notes,updatedate,updatedby,recordlastrequestedby,exportdate)

    SELECT id,6 as campaignid,0 as result,'' as personalcallbackid,'1/1/1900' as callbackdate,'1/1/1900' as appointmentdate,'' as notes,'1/1/1900' as updatedate,'' as updatedby,'' as recordlastrequestedby, '1/1/1900' as exportdate FROM prospects WHERE id=@newid

    INSERT INTO prospects(companyname,companyaddress,companyhousenr,companyhousenradd,companyzipcode,companycity,companyphone,cp_sex,cp_firstname,cp_middlename,cp_lastname,cp_email,cp_phone,telco_mob_connections,telco_mob_provider,telco_mob_enddate)

    VALUES ('MacDonalds','','','','AG','','0123456789','-1','Markos','','markos@macdonalds.nl','','','','T-Mobile','')

    set @newid=SCOPE_IDENTITY()

    INSERT INTO ProspectsInCampaigns (prospectid,campaignid,result,personalcallbackid,callbackdate,appointmentdate,notes,updatedate,updatedby,recordlastrequestedby,exportdate)

    SELECT id,5 as campaignid,0 as result,'' as personalcallbackid,'1/1/1900' as callbackdate,'1/1/1900' as appointmentdate,'' as notes,'1/1/1900' as updatedate,'' as updatedby,'' as recordlastrequestedby, '1/1/1900' as exportdate FROM prospects WHERE id=@newid

    INSERT INTO ProspectsInCampaigns (prospectid,campaignid,result,personalcallbackid,callbackdate,appointmentdate,notes,updatedate,updatedby,recordlastrequestedby,exportdate)

    SELECT id,6 as campaignid,0 as result,'' as personalcallbackid,'1/1/1900' as callbackdate,'1/1/1900' as appointmentdate,'' as notes,'1/1/1900' as updatedate,'' as updatedby,'' as recordlastrequestedby, '1/1/1900' as exportdate FROM prospects WHERE id=@newid

    INSERT INTO prospects(companyname,companyaddress,companyhousenr,companyhousenradd,companyzipcode,companycity,companyphone,cp_sex,cp_firstname,cp_middlename,cp_lastname,cp_email,cp_phone,telco_mob_connections,telco_mob_provider,telco_mob_enddate)

    VALUES ('Duran BV','Noord IJsseldijk','95 B','','PG','','0306304444','-1','Uzehir??','','u.duran@duranbv.nl','','','','T-Mobile','')

    set @newid=SCOPE_IDENTITY()

    INSERT INTO ProspectsInCampaigns (prospectid,campaignid,result,personalcallbackid,callbackdate,appointmentdate,notes,updatedate,updatedby,recordlastrequestedby,exportdate)

    SELECT id,1 as campaignid,0 as result,'' as personalcallbackid,'1/1/1900' as callbackdate,'1/1/1900' as appointmentdate,'' as notes,'1/1/1900' as updatedate,'' as updatedby,'' as recordlastrequestedby, '1/1/1900' as exportdate FROM prospects WHERE id=@newid

    INSERT INTO prospects(companyname,companyaddress,companyhousenr,companyhousenradd,companyzipcode,companycity,companyphone,cp_sex,cp_firstname,cp_middlename,cp_lastname,cp_email,cp_phone,telco_mob_connections,telco_mob_provider,telco_mob_enddate)

    VALUES ('Duran BV','','95 B','B','3402','IJsselstein','','1','','','','','15','','T-Mobile','1/9/2011')

    set @newid=SCOPE_IDENTITY()

    INSERT INTO ProspectsInCampaigns (prospectid,campaignid,result,personalcallbackid,callbackdate,appointmentdate,notes,updatedate,updatedby,recordlastrequestedby,exportdate)

    SELECT id,1 as campaignid,0 as result,'' as personalcallbackid,'1/1/1900' as callbackdate,'1/1/1900' as appointmentdate,'' as notes,'1/1/1900' as updatedate,'' as updatedby,'' as recordlastrequestedby, '1/1/1900' as exportdate FROM prospects WHERE id=@newid

    INSERT INTO prospects(companyname,companyaddress,companyhousenr,companyhousenradd,companyzipcode,companycity,companyphone,cp_sex,cp_firstname,cp_middlename,cp_lastname,cp_email,cp_phone,telco_mob_connections,telco_mob_provider,telco_mob_enddate)

    VALUES ('Duran BV','Noord IJsseldijk','','','3402PG','','','-1','','','','u.duran@duranbv.nl','','','T-Mobile','1/9/2011')

    set @newid=SCOPE_IDENTITY()

    INSERT INTO ProspectsInCampaigns (prospectid,campaignid,result,personalcallbackid,callbackdate,appointmentdate,notes,updatedate,updatedby,recordlastrequestedby,exportdate)

    SELECT id,4 as campaignid,0 as result,'' as personalcallbackid,'1/1/1900' as callbackdate,'10/13/2010 9:00' as appointmentdate,'' as notes,'1/1/1900' as updatedate,'' as updatedby,'' as recordlastrequestedby, '1/1/1900' as exportdate FROM prospects WHERE id=@newid

    --insert same record for same campaign

    INSERT INTO ProspectsInCampaigns (prospectid,campaignid,result,personalcallbackid,callbackdate,appointmentdate,notes,updatedate,updatedby,recordlastrequestedby,exportdate)

    SELECT id,4 as campaignid,14 as result,'' as personalcallbackid,'1/1/1900' as callbackdate,'10/13/2010 9:00' as appointmentdate,'positief gesprek!' as notes,'1/1/1900' as updatedate,'' as updatedby,'' as recordlastrequestedby, '1/1/1900' as exportdate FROM prospects WHERE id=@newid

    --insert same record for different campaign

    INSERT INTO ProspectsInCampaigns (prospectid,campaignid,result,personalcallbackid,callbackdate,appointmentdate,notes,updatedate,updatedby,recordlastrequestedby,exportdate)

    SELECT id,6 as campaignid,15 as result,'' as personalcallbackid,'1/1/1900' as callbackdate,'10/13/2010 9:00' as appointmentdate,'AFSPRAAK GESCOORD!' as notes,'1/1/1900' as updatedate,'' as updatedby,'' as recordlastrequestedby, '1/1/1900' as exportdate FROM prospects WHERE id=@newid

    INSERT INTO ProspectsInCampaigns (prospectid,campaignid,result,personalcallbackid,callbackdate,appointmentdate,notes,updatedate,updatedby,recordlastrequestedby,exportdate)

    SELECT id,4 as campaignid,8 as result,'' as personalcallbackid,'1/1/1900' as callbackdate,'10/13/2010 9:00' as appointmentdate,'' as notes,'1/1/1900' as updatedate,'' as updatedby,'' as recordlastrequestedby, '1/1/1900' as exportdate FROM prospects WHERE id=@newid

    --insert record that is not assigned to any campaign

    INSERT INTO prospects(companyname,companyaddress,companyhousenr,companyhousenradd,companyzipcode,companycity,companyphone,cp_sex,cp_firstname,cp_middlename,cp_lastname,cp_email,cp_phone,telco_mob_connections,telco_mob_provider,telco_mob_enddate)

    VALUES ('Company in No campaign','Mijnlaan','3','','45435 KB','Ablasserdam','0464209900','1','','','Peters','peters@nocamp.nl','','11','T-Mobile','2/1/2011')

    Now, you last statement was missing a ')', so I added that (glad I could do something as well 😉 and now it is this statement:

    -- work out which non-min prospectsincampaigns rows should be deleted

    select c.prospectid, c.campaignid into #deletions

    from dbo.ProspectsInCampaigns c, dbo.prospects p

    WHERE result = 0 and p.id = c.prospectid

    and exists (select * from dbo.ProspectsInCampaigns c2, dbo.prospects p2

    where p.companyname = p2.companyname and c2.campaignid = c.campaignid and c.campaignid=5

    and c2.prospectID = p2.ID and p.id > p2.ID)

    -- delete the propspectsincampaigns rows

    delete c from dbo.ProspectsInCampaigns c where exists (select * from #deletions d

    where d.prospectid = c.prospectid and d.campaignid = c.campaignid and c.campaignid=5)

    --delete redundant min prospectsincampaigns rows

    delete c from ProspectsInCampaigns c, Prospects p

    where result = 0 and p.id = c.prospectid

    and exists (select * from ProspectsInCampaigns c2, Prospects p2

    where c2.prospectID = p2.id and c2.campaignid = c.campaignID and c.campaignid=5

    and p2.companyname = p.companyname and c2.prospectID > c.prospectID)

    --delete the prospects rows matching deleted prospectsincampaigns rows

    delete p from dbo.Prospects p where exists (select * from #deletions d

    where d.prospectID = p.id)

    and not exists (select * from prospectsincampaigns C where c.prospectid = p.id)

    --delete redundant prospects rows

    delete p from prospects p

    where not exists (select * from ProspectsInCampaigns where prospectid = id and campaignid <> 5)

    and exists (select * from ProspectsInCampaigns C inner join prospects p2

    on p2.id = c.prospectid where p2.companyname = p.companyname and c.campaignid = 5)

    -- get rid of duplicate entries

    ;with findDups as

    (select *, row_number() over (order by prospectID,campaignid,result) as N

    from ProspectsInCampaigns)

    delete f from findDups f join findDups f2

    on f.prospectID=f2.prospectID and f.campaignid = f2.campaignid and f.campaignid = 5

    where f.N < f2.N and f.result = 0

    -- throw away temporary data

    drop table #deletions

    Now your statement works for campaignid=5, but when I replace 5 with 1, I dont now whats happening but a lot of data is deleted from prospects that should not be deleted. the rows in prospectsincampaigns ARE deleted correctly.

    Oh, and thanks for the little insight in SSC inner workings...pretty cool to see, this is actually the best SQL forum I've come across so far 🙂

    And isnt it weird I cant mark a certain post as the answer? I measn: how can I give you visible credit for you work then?

  • I decided to test with all the campaignids, just to make sure it was right. It was easier to make the initialisation and the solution into a pair of stored procs so that they could more easily be called in loops. The bug which caused it to delete everything was a missing inequality in the "delete redundant prospects rows" delete statement. the results that the currect code gives are:

    prospects:-

    1 Smeets M.B. Zuid-Limburg BV RETAINED

    2 F.J. ten Berge BV RETAINED

    3 F.J. ten Berge BV DELETED

    4 MacDonalds RETAINED

    5 MacDonalds DELETED

    6 Duran BV RETAINED

    7 Duran BV DELETED

    8 Duran BV RETAINED

    9 Company in No campaign RETAINED

    prospectsincampaigns

    1 1 0 RETAINED

    2 1 0 RETAINED

    3 1 0 DELETED

    4 5 0 RETAINED

    4 5 0 RETAINED

    4 6 0 RETAINED

    4 6 0 RETAINED

    5 5 0 DELETED

    5 6 0 DELETED

    6 1 0 RETAINED

    7 1 0 DELETED

    8 4 0 RETAINED

    8 4 0 RETAINED

    8 4 0 RETAINED

    8 4 8 RETAINED

    8 4 8 RETAINED

    8 4 8 RETAINED

    8 4 14 RETAINED

    8 4 14 RETAINED

    8 4 14 RETAINED

    8 6 15 RETAINED

    8 6 15 RETAINED

    8 6 15 RETAINED

    My current test rig (which you may find useful if you wantto test with bigger data sets), stored proc to set up test data, and stored proc to do the work for a specified campaign are below

    test rig

    use playpen

    go

    exec maketestdata

    create table #pbefore (

    id int not null primary key, companyname nvarchar(64))

    create table #cbefore (

    prospectid int not null, campaignid int not null, result int not null)

    create table #pafter (

    id int not null primary key, companyname nvarchar(64))

    create table #cafter (

    prospectid int not null, campaignid int not null, result int not null)

    insert #pbefore select id,companyname from prospects

    insert #cbefore select prospectid, campaignid, result from ProspectsInCampaigns

    create table #c (campaignid int primary key)

    insert #c select distinct campaignid from prospectsincampaigns

    declare @campid int = null

    select top 1 @campid = campaignid from #c

    while @campid is not null

    begin

    delete #c where campaignid = @campid

    exec DoDeletions @campid

    set @campid=null

    select top 1 @campid = campaignid from #c

    end

    drop table #c

    insert #cafter select prospectid,campaignid,result from ProspectsInCampaigns

    insert #pafter select id, companyname from prospects

    select id, companyname, case when X IS null then 'DELETED' else 'RETAINED' end as action

    from (select B.*, A.id as X from #pbefore B left join #pafter A on A.id=B.id) R

    order by ID

    select prospectid, campaignid, result, case when X IS null then 'DELETED' else 'RETAINED' end as action

    from (select B.*, A.prospectid as X from #cbefore B left join #cafter A on A.prospectid=B.prospectid) R

    order by prospectid, campaignid, result

    drop table #pafter,#pbefore, #cafter,#cbefore

    test data

    USE [playpen]

    GO

    /****** Object: StoredProcedure dbo.MakeTestdata ******/

    IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[MakeTestdata]') AND type in (N'P', N'PC'))

    DROP PROCEDURE dbo.MakeTestdata

    GO

    /****** Object: StoredProcedure dbo.MakeTestdata ******/

    SET ANSI_NULLS ON

    GO

    create proc dbo.MakeTestdata as

    begin

    drop table prospects,prospectsincampaigns

    create table prospects (

    id int identity(1,1) primary key,

    companyname varchar(64) not null,

    companyaddress varchar(64),

    companyhousenr varchar(64),

    companyhousenradd varchar(64),

    companyzipcode varchar(64),

    companycity varchar(64),

    companyphone varchar(64),

    cp_sex varchar(64),

    cp_firstname varchar(64),

    cp_middlename varchar(64),

    cp_lastname varchar(64),

    cp_email varchar(64),

    cp_phone varchar(64),

    telco_mob_connections varchar(64),

    telco_mob_provider varchar(64),

    telco_mob_enddate varchar(64)

    )

    create table ProspectsInCampaigns (

    prospectid int not null,

    campaignid int not null,

    result int not null,

    personalcallbackid varchar(64),

    callbackdate varchar(64),

    appointmentdate varchar(64),

    notes varchar(64),

    updatedate varchar(64),

    updatedby varchar(64),

    recordlastrequestedby varchar(64),

    exportdate varchar(64)

    )

    declare @newid int

    INSERT INTO prospects(companyname,companyaddress,companyhousenr,companyhousenradd,companyzipcode,companycity,companyphone,cp_sex,cp_firstname,cp_middlename,cp_lastname,cp_email,cp_phone,telco_mob_connections,telco_mob_provider,telco_mob_enddate)

    VALUES ('Smeets M.B. Zuid-Limburg BV','Handelsstraat','2','','6433 KB','Heerlen (MAASTRICHT)','0464209900','1','','','Laumans','n.laumans@smeets-mb.nl','','15','T-Mobile','2/1/2011')

    set @newid=SCOPE_IDENTITY()

    INSERT INTO ProspectsInCampaigns (prospectid,campaignid,result,personalcallbackid,callbackdate,appointmentdate,notes,updatedate,updatedby,recordlastrequestedby,exportdate)

    SELECT id,1 as campaignid,0 as result,'' as personalcallbackid,'1/1/1900' as callbackdate,'12/10/10 16:00' as appointmentdate,'' as notes,'1/1/1900' as updatedate,'' as updatedby,'' as recordlastrequestedby, '1/1/1900' as exportdate FROM prospects WHERE id=@newid

    INSERT INTO prospects(companyname,companyaddress,companyhousenr,companyhousenradd,companyzipcode,companycity,companyphone,cp_sex,cp_firstname,cp_middlename,cp_lastname,cp_email,cp_phone,telco_mob_connections,telco_mob_provider,telco_mob_enddate)

    VALUES ('F.J. ten Berge BV','Wattstraat','5','','7461AB','RIJSSEN','0548512525','1','','','Nijkamp','','','18','Telfort','4/10/2011')

    set @newid=SCOPE_IDENTITY()

    INSERT INTO ProspectsInCampaigns (prospectid,campaignid,result,personalcallbackid,callbackdate,appointmentdate,notes,updatedate,updatedby,recordlastrequestedby,exportdate)

    SELECT id,1 as campaignid,0 as result,'' as personalcallbackid,'1/1/1900' as callbackdate,'12/10/10 16:00' as appointmentdate,'' as notes,'1/1/1900' as updatedate,'' as updatedby,'' as recordlastrequestedby, '1/1/1900' as exportdate FROM prospects WHERE id=@newid

    INSERT INTO prospects(companyname,companyaddress,companyhousenr,companyhousenradd,companyzipcode,companycity,companyphone,cp_sex,cp_firstname,cp_middlename,cp_lastname,cp_email,cp_phone,telco_mob_connections,telco_mob_provider,telco_mob_enddate)

    VALUES ('F.J. ten Berge BV','Wattstraat','5','B','7461AB','RIJSSEN','0548512525','1','','','Nijkamp','mnijkamp@tenbergerijssen.nl','','18','Telfort','4/10/2011')

    set @newid=SCOPE_IDENTITY()

    INSERT INTO ProspectsInCampaigns (prospectid,campaignid,result,personalcallbackid,callbackdate,appointmentdate,notes,updatedate,updatedby,recordlastrequestedby,exportdate)

    SELECT id,1 as campaignid,0 as result,'' as personalcallbackid,'1/1/1900' as callbackdate,'12/10/10 16:00' as appointmentdate,'' as notes,'1/1/1900' as updatedate,'' as updatedby,'' as recordlastrequestedby, '1/1/1900' as exportdate FROM prospects WHERE id=@newid

    INSERT INTO prospects(companyname,companyaddress,companyhousenr,companyhousenradd,companyzipcode,companycity,companyphone,cp_sex,cp_firstname,cp_middlename,cp_lastname,cp_email,cp_phone,telco_mob_connections,telco_mob_provider,telco_mob_enddate)

    VALUES ('MacDonalds','Rozsemarylane','1 B','','AG','','','-1','Markos','','markos@macdonalds.nl','','','','T-Mobile','')

    set @newid=SCOPE_IDENTITY()

    INSERT INTO ProspectsInCampaigns (prospectid,campaignid,result,personalcallbackid,callbackdate,appointmentdate,notes,updatedate,updatedby,recordlastrequestedby,exportdate)

    SELECT id,5 as campaignid,0 as result,'' as personalcallbackid,'1/1/1900' as callbackdate,'1/1/1900' as appointmentdate,'' as notes,'1/1/1900' as updatedate,'' as updatedby,'' as recordlastrequestedby, '1/1/1900' as exportdate FROM prospects WHERE id=@newid

    INSERT INTO ProspectsInCampaigns (prospectid,campaignid,result,personalcallbackid,callbackdate,appointmentdate,notes,updatedate,updatedby,recordlastrequestedby,exportdate)

    SELECT id,6 as campaignid,0 as result,'' as personalcallbackid,'1/1/1900' as callbackdate,'1/1/1900' as appointmentdate,'' as notes,'1/1/1900' as updatedate,'' as updatedby,'' as recordlastrequestedby, '1/1/1900' as exportdate FROM prospects WHERE id=@newid

    INSERT INTO prospects(companyname,companyaddress,companyhousenr,companyhousenradd,companyzipcode,companycity,companyphone,cp_sex,cp_firstname,cp_middlename,cp_lastname,cp_email,cp_phone,telco_mob_connections,telco_mob_provider,telco_mob_enddate)

    VALUES ('MacDonalds','','','','AG','','0123456789','-1','Markos','','markos@macdonalds.nl','','','','T-Mobile','')

    set @newid=SCOPE_IDENTITY()

    INSERT INTO ProspectsInCampaigns (prospectid,campaignid,result,personalcallbackid,callbackdate,appointmentdate,notes,updatedate,updatedby,recordlastrequestedby,exportdate)

    SELECT id,5 as campaignid,0 as result,'' as personalcallbackid,'1/1/1900' as callbackdate,'1/1/1900' as appointmentdate,'' as notes,'1/1/1900' as updatedate,'' as updatedby,'' as recordlastrequestedby, '1/1/1900' as exportdate FROM prospects WHERE id=@newid

    INSERT INTO ProspectsInCampaigns (prospectid,campaignid,result,personalcallbackid,callbackdate,appointmentdate,notes,updatedate,updatedby,recordlastrequestedby,exportdate)

    SELECT id,6 as campaignid,0 as result,'' as personalcallbackid,'1/1/1900' as callbackdate,'1/1/1900' as appointmentdate,'' as notes,'1/1/1900' as updatedate,'' as updatedby,'' as recordlastrequestedby, '1/1/1900' as exportdate FROM prospects WHERE id=@newid

    INSERT INTO prospects(companyname,companyaddress,companyhousenr,companyhousenradd,companyzipcode,companycity,companyphone,cp_sex,cp_firstname,cp_middlename,cp_lastname,cp_email,cp_phone,telco_mob_connections,telco_mob_provider,telco_mob_enddate)

    VALUES ('Duran BV','Noord IJsseldijk','95 B','','PG','','0306304444','-1','Uzehir??','','u.duran@duranbv.nl','','','','T-Mobile','')

    set @newid=SCOPE_IDENTITY()

    INSERT INTO ProspectsInCampaigns (prospectid,campaignid,result,personalcallbackid,callbackdate,appointmentdate,notes,updatedate,updatedby,recordlastrequestedby,exportdate)

    SELECT id,1 as campaignid,0 as result,'' as personalcallbackid,'1/1/1900' as callbackdate,'1/1/1900' as appointmentdate,'' as notes,'1/1/1900' as updatedate,'' as updatedby,'' as recordlastrequestedby, '1/1/1900' as exportdate FROM prospects WHERE id=@newid

    INSERT INTO prospects(companyname,companyaddress,companyhousenr,companyhousenradd,companyzipcode,companycity,companyphone,cp_sex,cp_firstname,cp_middlename,cp_lastname,cp_email,cp_phone,telco_mob_connections,telco_mob_provider,telco_mob_enddate)

    VALUES ('Duran BV','','95 B','B','3402','IJsselstein','','1','','','','','15','','T-Mobile','1/9/2011')

    set @newid=SCOPE_IDENTITY()

    INSERT INTO ProspectsInCampaigns (prospectid,campaignid,result,personalcallbackid,callbackdate,appointmentdate,notes,updatedate,updatedby,recordlastrequestedby,exportdate)

    SELECT id,1 as campaignid,0 as result,'' as personalcallbackid,'1/1/1900' as callbackdate,'1/1/1900' as appointmentdate,'' as notes,'1/1/1900' as updatedate,'' as updatedby,'' as recordlastrequestedby, '1/1/1900' as exportdate FROM prospects WHERE id=@newid

    INSERT INTO prospects(companyname,companyaddress,companyhousenr,companyhousenradd,companyzipcode,companycity,companyphone,cp_sex,cp_firstname,cp_middlename,cp_lastname,cp_email,cp_phone,telco_mob_connections,telco_mob_provider,telco_mob_enddate)

    VALUES ('Duran BV','Noord IJsseldijk','','','3402PG','','','-1','','','','u.duran@duranbv.nl','','','T-Mobile','1/9/2011')

    set @newid=SCOPE_IDENTITY()

    INSERT INTO ProspectsInCampaigns (prospectid,campaignid,result,personalcallbackid,callbackdate,appointmentdate,notes,updatedate,updatedby,recordlastrequestedby,exportdate)

    SELECT id,4 as campaignid,0 as result,'' as personalcallbackid,'1/1/1900' as callbackdate,'10/13/2010 9:00' as appointmentdate,'' as notes,'1/1/1900' as updatedate,'' as updatedby,'' as recordlastrequestedby, '1/1/1900' as exportdate FROM prospects WHERE id=@newid

    --insert same record for same campaign

    INSERT INTO ProspectsInCampaigns (prospectid,campaignid,result,personalcallbackid,callbackdate,appointmentdate,notes,updatedate,updatedby,recordlastrequestedby,exportdate)

    SELECT id,4 as campaignid,14 as result,'' as personalcallbackid,'1/1/1900' as callbackdate,'10/13/2010 9:00' as appointmentdate,'positief gesprek!' as notes,'1/1/1900' as updatedate,'' as updatedby,'' as recordlastrequestedby, '1/1/1900' as exportdate FROM prospects WHERE id=@newid

    --insert same record for different campaign

    INSERT INTO ProspectsInCampaigns (prospectid,campaignid,result,personalcallbackid,callbackdate,appointmentdate,notes,updatedate,updatedby,recordlastrequestedby,exportdate)

    SELECT id,6 as campaignid,15 as result,'' as personalcallbackid,'1/1/1900' as callbackdate,'10/13/2010 9:00' as appointmentdate,'AFSPRAAK GESCOORD!' as notes,'1/1/1900' as updatedate,'' as updatedby,'' as recordlastrequestedby, '1/1/1900' as exportdate FROM prospects WHERE id=@newid

    INSERT INTO ProspectsInCampaigns (prospectid,campaignid,result,personalcallbackid,callbackdate,appointmentdate,notes,updatedate,updatedby,recordlastrequestedby,exportdate)

    SELECT id,4 as campaignid,8 as result,'' as personalcallbackid,'1/1/1900' as callbackdate,'10/13/2010 9:00' as appointmentdate,'' as notes,'1/1/1900' as updatedate,'' as updatedby,'' as recordlastrequestedby, '1/1/1900' as exportdate FROM prospects WHERE id=@newid

    --insert record that is not assigned to any campaign

    INSERT INTO prospects(companyname,companyaddress,companyhousenr,companyhousenradd,companyzipcode,companycity,companyphone,cp_sex,cp_firstname,cp_middlename,cp_lastname,cp_email,cp_phone,telco_mob_connections,telco_mob_provider,telco_mob_enddate)

    VALUES ('Company in No campaign','Mijnlaan','3','','45435 KB','Ablasserdam','0464209900','1','','','Peters','peters@nocamp.nl','','11','T-Mobile','2/1/2011')

    end

    GO

    deleter

    USE playpen

    GO

    /****** Object: StoredProcedure dbo.DoDeletions ******/

    IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[DoDeletions]') AND type in (N'P', N'PC'))

    DROP PROCEDURE dbo.DoDeletions

    GO

    /****** Object: StoredProcedure dbo.DoDeletions ******/

    SET ANSI_NULLS ON

    GO

    CREATE proc dbo.DoDeletions @campid int

    as

    begin

    -- work out which non-min prospectsincampaigns rows should be deleted

    select c.prospectid, c.campaignid into #deletions

    from dbo.ProspectsInCampaigns c, dbo.prospects p

    WHERE result = 0 and p.id = c.prospectid

    and exists (select * from dbo.ProspectsInCampaigns c2, dbo.prospects p2

    where p.companyname = p2.companyname and c2.campaignid = c.campaignid and c.campaignid=@campid

    and c2.prospectID = p2.ID and p.id > p2.ID)

    -- delete the propspectsincampaigns rows

    delete c from dbo.ProspectsInCampaigns c

    where exists (select * from #deletions d

    where d.prospectid = c.prospectid and d.campaignid = c.campaignid and c.campaignid=@campid)

    --delete redundant min prospectsincampaigns rows

    delete c from ProspectsInCampaigns c, Prospects p

    where result = 0 and p.id = c.prospectid

    and exists (select * from ProspectsInCampaigns c2, Prospects p2

    where c2.prospectID = p2.id and c2.campaignid = c.campaignID and c.campaignid=@campid

    and p2.companyname = p.companyname and c2.prospectID > c.prospectID)

    --delete the prospects rows matching deleted prospectsincampaigns rows

    delete p from dbo.Prospects p

    where exists (select * from #deletions d where d.prospectID = p.id)

    and not exists (select * from prospectsincampaigns C where c.prospectid = p.id)

    --delete redundant prospects rows

    delete p from prospects p

    where not exists (select * from ProspectsInCampaigns where prospectid = id and campaignid <> @campid)

    and exists (select * from ProspectsInCampaigns c inner join prospects p2

    on p2.id = c.prospectid where p2.id <> p.id and p2.companyname = p.companyname and c.campaignid = @campid)

    -- get rid of duplicate entries

    ;with findDups as

    (select *, row_number() over (order by prospectID,campaignid,result) as N from ProspectsInCampaigns)

    delete f from findDups f join findDups f2

    on f.prospectID=f2.prospectID and f.campaignid = f2.campaignid and f.campaignid = @campid

    where f.N < f2.N and f.result = 0

    -- throw away temporary data

    drop table #deletions

    end

    GO

    petervdkerk (8/5/2011)


    And isnt it weird I cant mark a certain post as the answer? I measn: how can I give you visible credit for you work then?

    Points are scored by posting (and by answering QoTD), and people get a reputation for being helpful and/or being expert on a particular area (or of course for being unhelpful) because other memebers of the community read their posts. This works pretty well.

    Tom

  • Hi, ok, that works better.Now I tested it on a larger resultset, and there are companies that have NO name.

    If a company has NO name I DONT want to delete it as a duplicate, because there's no real ground on deleting it.

    So I tried adding: and p.companyname<>'' on several spots in the code below, but those records without companyname are still deleted.

    What did I do wrong?

    declare @campid int

    set @campid=1

    -- work out which non-min prospectsincampaigns rows should be deleted

    select c.prospectid, c.campaignid into #deletions

    from dbo.ProspectsInCampaigns c, dbo.prospects p

    WHERE result = 0 and p.id = c.prospectid

    and exists (select * from dbo.ProspectsInCampaigns c2, dbo.prospects p2

    where p.companyname = p2.companyname and p.companyname<>'' and c2.campaignid = c.campaignid and c.campaignid=@campid

    and c2.prospectID = p2.ID and p.id > p2.ID)

    -- delete the propspectsincampaigns rows

    delete c from dbo.ProspectsInCampaigns c

    where exists (select * from #deletions d

    where d.prospectid = c.prospectid and d.campaignid = c.campaignid and c.campaignid=@campid)

    --delete redundant min prospectsincampaigns rows

    delete c from ProspectsInCampaigns c, Prospects p

    where result = 0 and p.id = c.prospectid

    and exists (select * from ProspectsInCampaigns c2, Prospects p2

    where c2.prospectID = p2.id and c2.campaignid = c.campaignID and c.campaignid=@campid

    and p2.companyname = p.companyname and p.companyname<>'' and c2.prospectID > c.prospectID)

    --delete the prospects rows matching deleted prospectsincampaigns rows

    delete p from dbo.Prospects p

    where exists (select * from #deletions d where d.prospectID = p.id)

    and not exists (select * from prospectsincampaigns C where c.prospectid = p.id)

    --delete redundant prospects rows

    delete p from prospects p

    where not exists (select * from ProspectsInCampaigns where prospectid = id and campaignid <> @campid)

    and exists (select * from ProspectsInCampaigns c inner join prospects p2

    on p2.id = c.prospectid where p2.id <> p.id and p2.companyname = p.companyname and p.companyname<>'' and c.campaignid = @campid)

    -- get rid of duplicate entries

    ;with findDups as

    (select *, row_number() over (order by prospectID,campaignid,result) as N from ProspectsInCampaigns)

    delete f from findDups f join findDups f2

    on f.prospectID=f2.prospectID and f.campaignid = f2.campaignid and f.campaignid = @campid

    where f.N < f2.N and f.result = 0

    -- throw away temporary data

    drop table #deletions

  • I don't have the data that you have but I would step through the code.

    I would convert the delete statements into select statements and order by CompanyName to isolate the problem.

    For better, quicker answers on T-SQL questions, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Ok, I started that. Instead of doing it all on companyname I've now done it on companyphone, because you guys then have the same testdata I already posted earlier and technically it shouldnt matter if I apply the statement on companyname column or companyphone column.

    I think I found where it goes wrong, but dont know how to fix it. Please see my comments in code below.

    So I now have:

    declare @campid int

    set @campid=1

    drop table #deletions

    -- work out which non-min prospectsincampaigns rows should be deleted

    select c.prospectid, c.campaignid into #deletions

    from dbo.ProspectsInCampaigns c, dbo.prospects p

    WHERE result = 0 and p.id = c.prospectid

    and exists (select * from dbo.ProspectsInCampaigns c2, dbo.prospects p2

    where p.companyphone = p2.companyphone and p.companyphone<>'' and c2.campaignid = c.campaignid and c.campaignid=@campid

    and c2.prospectID = p2.ID and p.id > p2.ID)

    -- delete the propspectsincampaigns rows

    select * from dbo.ProspectsInCampaigns c -- <--THIS STATEMENT GIVE ME CORRECT ROW TO DELETE

    where exists (select * from #deletions d

    where d.prospectid = c.prospectid and d.campaignid = c.campaignid and c.campaignid=@campid)

    --delete redundant min prospectsincampaigns rows

    select * from ProspectsInCampaigns c, Prospects p -- <--THIS STATEMENT GIVES ME WRONG ROW TO DELETE

    where result = 0 and p.id = c.prospectid and p.companyphone<>''

    and exists (select * from ProspectsInCampaigns c2, Prospects p2

    where c2.prospectID = p2.id and c2.campaignid = c.campaignID and c.campaignid=@campid

    and p2.companyphone = p.companyphone and c2.prospectID > c.prospectID)

    /*

    --delete the prospects rows matching deleted prospectsincampaigns rows

    delete p from dbo.Prospects p

    where exists (select * from #deletions d where d.prospectID = p.id)

    and not exists (select * from prospectsincampaigns C where c.prospectid = p.id)

    --delete redundant prospects rows

    delete p from prospects p

    where not exists (select * from ProspectsInCampaigns where prospectid = id and campaignid <> @campid)

    and exists (select * from ProspectsInCampaigns c inner join prospects p2

    on p2.id = c.prospectid where p2.id <> p.id and p2.companyphone = p.companyphone and p.companyphone<>'' and c.campaignid = @campid)

    -- get rid of duplicate entries

    ;with findDups as

    (select *, row_number() over (order by prospectID,campaignid,result) as N from ProspectsInCampaigns)

    delete f from findDups f join findDups f2

    on f.prospectID=f2.prospectID and f.campaignid = f2.campaignid and f.campaignid = @campid

    where f.N < f2.N and f.result = 0

    -- throw away temporary data

    drop table #deletions

    */

    Thanks!

Viewing 15 posts - 16 through 30 (of 37 total)

You must be logged in to reply to this topic. Login to reply