Display Duplicates

  • I have the following columns in a table: AccountNumber, TelephoneNumber, DateCalled.

    I'd like to find and show all of the occurrences where the same AccountNumber and TelephoneNumber were called on the same date. There is no primary key on the table. There should only be one telephone call per day, but I want to list out the info when there are more than one and include all three columns in the list. Since DateCalled is a datetime to the seconds position there are no occurrences where two rows woudl exist with exactly the same datetime down to the second.

  • Hi,

    I have written the following query using the temp table #calls. Just change it to the name of your table and check the column names are correct. Basically I am putting all the duplicates into a CTE and then joining that back to the original records. This allows you to see all the details for duplicate records (ie, the specific datetime called).

    This bit of code:

    DATEADD(DAY,DATEDIFF(DAY,1,dateCalled),1)

    converts the datetime into the same date and 00:00:00 time

    with cteCalls(AccountNumber, TelephoneNumber, callDay, callDayCount)

    as

    (select AccountNumber , TelephoneNumber, DATEADD(DAY,DATEDIFF(DAY,1,dateCalled),1) as callDay, count(DATEADD(DAY,DATEDIFF(DAY,1,dateCalled),1) ) as callDayCount

    from #calls

    group by AccountNumber , TelephoneNumber, DATEADD(DAY,DATEDIFF(DAY,1,dateCalled),1)

    having count(DATEADD(DAY,DATEDIFF(DAY,1,dateCalled),1) ) > 1)

    select #calls.* from #calls inner join cteCalls on

    #calls.AccountNumber = cteCalls.AccountNumber and #calls.TelephoneNumber = cteCalls.TelephoneNumber and DATEADD(DAY,DATEDIFF(DAY,1,#calls.dateCalled),1) = cteCalls.callday

    B

  • Here's a way to do it without the CTE.

    DECLARE @Calls TABLE (AccountNumber int, TelephoneNumber varchar(10), DateCalled datetime)

    INSERT INTO @Calls

    SELECT 1, 4026146434, '2008-11-17 17:01:27.023' UNION ALL

    SELECT 2, 4026115201, '2008-11-19 17:01:27.023' UNION ALL

    SELECT 1, 4026146434, '2008-11-19 17:01:27.023' UNION ALL

    SELECT 3, 4026782001, '2008-11-19 17:01:27.023' UNION ALL

    SELECT 1, 4026146434, '2008-11-19 15:01:27.023'

    SELECT C1.*

    FROM @Calls C1

    INNER JOIN (

    SELECT AccountNumber,

    TelephoneNumber,

    CAST(CONVERT(varchar,DateCalled,101) as datetime) as DateCalled

    FROM @Calls

    GROUP BY AccountNumber, TelephoneNumber, CAST(CONVERT(varchar,DateCalled,101) as datetime)

    HAVING COUNT(*) > 1

    ) C2

    ON C1.AccountNumber = C2.AccountNumber AND C1.TelephoneNumber = C2.TelephoneNumber

    AND CAST(CONVERT(varchar,C1.DateCalled,101) as datetime) = C2.DateCalled

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • I get the same execution plan for both queries and the exact same run-time. But I am running it on a measly 5 records. I'd be interested to know if the original poster gets any different results from the two solutions.... but I am guessing they will be identical.

  • Thanks all for the help. They both run in the exact same time, 1 minute 4 seconds, and return ~76,000 records.

  • I'd be curious to see how that runs on a large table > 1M rows. I don't have the time to test it myself, but I'd be curious. I've seen where the optimizer translates CTE's into correlated subqueries. I would guess that both options would be similar with a small number of rows, but the larger the table becomes, the more concern I would have on the CTE.

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • Using SSCrazy data you might want to try:

    SELECT 1, 4026146434, '2008-11-17 17:01:27.023' UNION ALL

    SELECT 2, 4026115201, '2008-11-19 17:01:27.023' UNION ALL

    SELECT 1, 4026146434, '2008-11-19 17:01:27.023' UNION ALL

    SELECT 3, 4026782001, '2008-11-19 17:01:27.023' UNION ALL

    SELECT 1, 4026146434, '2008-11-19 15:01:27.023'

    ;with numbered as(SELECT rowno=row_number() over

    (partition by AccountNumber, TelephoneNumber order by AccountNumber),AccountNumber,TelephoneNumber, DateCalled

    from @Calls)

    select * from numbered WHERE Rowno > 1

    This will give you:

    rowno AccountNumber TelephoneNumber DateCalled

    2 1 4026146434 2008-11-19 17:01:27.023

    3 1 4026146434 2008-11-19 15:01:27.023

    If everything seems to be going well, you have obviously overlooked something.

    Ron

    Please help us, help you -before posting a question please read[/url]
    Before posting a performance problem please read[/url]

  • I think bitbucket's attempt will fall short whenever the same account number dials the same phone number once on one day, once the next, and then twice on another day after that. Here's my shot at it:

    DECLARE @PHONE_ACTIVITY TABLE (

    AccountNumber int,

    TelephoneNumber bigint,

    DateCalled DateTime

    )

    INSERT INTO @PHONE_ACTIVITY

    SELECT 1, 4026146434, '2008-11-17 17:01:27.023' UNION ALL

    SELECT 2, 4026115201, '2008-11-19 17:01:27.023' UNION ALL

    SELECT 1, 4026146434, '2008-11-19 17:01:27.023' UNION ALL

    SELECT 3, 4026782001, '2008-11-19 17:01:27.023' UNION ALL

    SELECT 1, 4026146434, '2008-11-19 15:01:27.023'

    SELECT Y.AccountNumber, Y.TelephoneNumber, PA.DateCalled

    FROM (

    SELECT X.AccountNumber, X.TelephoneNumber, MIN(DateCalled) AS DateCalled

    FROM (

    SELECT ROW_NUMBER() OVER(PARTITION BY AccountNumber, TelephoneNumber,

    CAST(CAST(YEAR(DateCalled) AS char(4))+'-'+RIGHT('0'+CAST(MONTH(DateCalled) AS varchar(2)),2)+'-'+

    RIGHT('0'+CAST(DAY(DateCalled) AS varchar(2)),2) AS DateTime)

    ORDER BY AccountNumber, TelephoneNumber) AS RN, AccountNumber, TelephoneNumber,

    CAST(CAST(YEAR(DateCalled) AS char(4))+'-'+RIGHT('0'+CAST(MONTH(DateCalled) AS varchar(2)),2)+'-'+

    RIGHT('0'+CAST(DAY(DateCalled) AS varchar(2)),2) AS DateTime) AS DateCalled

    FROM @PHONE_ACTIVITY

    ) AS X

    GROUP BY AccountNumber, TelephoneNumber,

    CAST(CAST(YEAR(DateCalled) AS char(4))+'-'+RIGHT('0'+CAST(MONTH(DateCalled) AS varchar(2)),2)+'-'+

    RIGHT('0'+CAST(DAY(DateCalled) AS varchar(2)),2) AS DateTime)

    HAVING MAX(X.RN) > 1

    ) AS Y LEFT OUTER JOIN @PHONE_ACTIVITY AS PA

    ON Y.AccountNumber = PA.AccountNumber AND

    Y.TelephoneNumber = PA.TelephoneNumber AND

    PA.DateCalled BETWEEN Y.DateCalled AND DATEADD(ms,-3,DATEADD(d,1,Y.DateCalled))

    ORDER BY Y.AccountNumber, Y.TelephoneNumber, PA.DateCalled

    It's been tested, although I'm not sure about all the extra function use not affecting performance negatively.

    Steve

    (aka smunson)

    :):):)

  • Smunson:

    think bitbucket's attempt will fall short whenever the same account number dials the same phone number once on one day, once the next, and then twice on another day after that.

    You are CORRECT my suggestion will report incorrectly.

    To all sorry about that, fell victim to writing code before thinking. To make it work correctly it would have to be modified to:

    ;with numbered as(SELECT rowno=row_number() over

    (partition by AccountNumber, TelephoneNumber, DATEPART(yy, DateCalled),DATEPART(mm,DateCalled),DATEPART(dd,DateCalled)

    order by DateCalled),AccountNumber,TelephoneNumber, DateCalled

    from @Calls)

    Smunson - Thank you for pointing out the problem

    If everything seems to be going well, you have obviously overlooked something.

    Ron

    Please help us, help you -before posting a question please read[/url]
    Before posting a performance problem please read[/url]

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply