Duplicate entries with unmatching related data

  • Hi,

    I am looking for some help on a sql for a report that I need to run. Here is a sample of the data that I need to format

    Table One

    ID       Address

    12      500 somewhare NY

    13      500 somewhare NY

    14      500 somewhare NY

    15      505 somewhare NJ

    16      505 somewhare NJ

    17      506 newtown NH

    18      506 newtown NH

     

    Table Two

    ID    Agent Name

    12    Jess

    13    Null

    14    Jess

    15    Dave

    16    Dave

    17    John

    18   Null

     

    Table Three

    ID     MLSNO

    12    Null

    13    12313

    14    12313

    15    12333

    16    12333

    17    11111

    18    Null

     

    I need to get unique address that correspond with the most data. The following sql works however it takes 56 seconds to run and that is too long for the application (especially when the end result is only about 1500 records) I have been racking my head on this for a while. Any help is appreciated.

    SET NOCOUNT ON

    CREATE TABLE #TMP (lDOCID INT, ADDRESS VARCHAR(40), MLSNO VARCHAR(40), AGENTID VARCHAR(40), FULLNAME VARCHAR(40)) DECLARE @AD VARCHAR(40) DECLARE CUR CURSOR FOR SELECT DISTINCT DSUSER.tblCFD_ADDRESS.sVALUE FROM DSUSER.tblCFD_ADDRESS ORDER BY DSUSER.tblCFD_ADDRESS.sVALUE OPEN CUR FETCH NEXT FROM CUR INTO @AD WHILE @@FETCH_STATUS = 0 BEGIN

     INSERT #TMP

     SELECT TOP 1 A.lDOCID, A.sVALUE AS ADDRESS, M.sVALUE AS MLSNO,

     AG.sVALUE AS AGENTID, USR.sFULLNAME AS FULLNAME FROM DSUSER.tblCFD_ADDRESS A

     LEFT JOIN DSUSER.tblCFD_MLS_NO M ON A.lDOCID = M.lDOCID

     LEFT JOIN DSUSER.tblCFD_AGENTID AG ON A.lDOCID = AG.lDOCID

     LEFT JOIN DSUSER.tblUSER USR ON AG.sVALUE=USR.sNAME

                   WHERE A.sVALUE = @AD

     ORDER BY A.sVALUE, M.sVALUE DESC, AG.sVALUE DESC

       FETCH NEXT FROM CUR

       INTO @AD

    END

    CLOSE CUR

    DEALLOCATE CUR

    SET NOCOUNT OFF

    SELECT * FROM #TMP

    DROP TABLE #TMP

     

  • I just relized I posted the actual sql this one is modified to match the sample data posted:

    SET NOCOUNT ON

    CREATE TABLE #TMP (lDOCID INT, ADDRESS VARCHAR(40), MLSNO VARCHAR(40), AGENTID VARCHAR(40)) DECLARE @AD VARCHAR(40)

    DECLARE CUR CURSOR FOR SELECT DISTINCT ADDRESS FROM 

     ORDER BY ADDRESS

    OPEN CUR

    FETCH NEXT FROM CUR INTO @AD WHILE @@FETCH_STATUS = 0 BEGIN

     INSERT #TMP

     SELECT TOP 1 ID, ADDRESS, MLSNO, AGENTID FROM 

     A

     LEFT JOIN 

     AG ON A.ID = AG.ID

     LEFT JOIN 

    M ON A.ID = M.ID

                    WHERE A.ADDRESS = @AD

     ORDER BY A.sVALUE, M.sVALUE DESC, AG.sVALUE DESC

       FETCH NEXT FROM CUR

       INTO @AD

    END

    CLOSE CUR

    DEALLOCATE CUR

    SET NOCOUNT OFF

    SELECT * FROM #TMP

    DROP TABLE #TMP

  • Have you tried this?

     

    CREATE

    TABLE #TMP

    (

    RecID INT IDENTITY(1, 1),

    lDOCID INT,

    ADDRESS VARCHAR(40),

    MLSNO VARCHAR(40),

    AGENTID VARCHAR(40)

    )

    INSERT #TMP

    SELECT ID,

    ADDRESS,

    MLSNO,

    AGENTID

    FROM

    A

    LEFT JOIN

    AG ON A.ID = AG.ID

    LEFT JOIN

    M ON A.ID = M.ID

    ORDER BY A.address,

    M.mlsno DESC,

    AG.agentid DESC

    Select t.lDOCID,

    t.ADDRESS,

    t.MLSNO,

    t.AGENTID

    from #tmp AS t

    INNER JOIN (

    SELECT min(recid) as firstid

    from #temp

    group by address

    ) AS d ON d.FirstID = t.RecID

    order by t.recid

     


    N 56°04'39.16"
    E 12°55'05.25"

  • Well, the reason for your query taking so long is because you've written a cursor to handle a task where it is clearly not needed.  This can be sped up significantly if it is re-written using SET based logic. 

    I would normally have done this for you, but it seems that your test code does not work with your test data as you've omitted the sValue column from the test tables.  Can you update your post to include the proper sValues for your data?  This should be an easy re-write.  I'm heading out for the day, but I'll check in tomorrow and if someone else hasn't alredy taken care of you, I will.

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • I know that there is another way to do this not using a cursor but I have a complete block on how it can be done. I will try the post above and see if it works out. If you have another way let me know. I have replaced the sValues.

    SET NOCOUNT ON

    CREATE TABLE #TMP (lDOCID INT, ADDRESS VARCHAR(40), MLSNO VARCHAR(40), AGENTID VARCHAR(40)) DECLARE @AD VARCHAR(40)

    DECLARE CUR CURSOR FOR SELECT DISTINCT ADDRESS FROM 

     ORDER BY ADDRESS

    OPEN CUR

    FETCH NEXT FROM CUR INTO @AD WHILE @@FETCH_STATUS = 0 BEGIN

     INSERT #TMP

     SELECT TOP 1 ID, ADDRESS, MLSNO, AGENTID FROM 

     A

     LEFT JOIN 

     AG ON A.ID = AG.ID

     LEFT JOIN 

    M ON A.ID = M.ID

                    WHERE A.ADDRESS = @AD

     ORDER BY Address, MLSNO DESC, AGENTIT DESC

       FETCH NEXT FROM CUR

       INTO @AD

    END

    CLOSE CUR

    DEALLOCATE CUR

    SET NOCOUNT OFF

    SELECT * FROM #TMP

    DROP TABLE #TMP

  • The message from peter worked great. Thank You.

    I am interested in how you would use set based logic to do this. Can you explain by either using the sample I provided or something simular

  • I am using set based logic.

    Your original code took 56 seconds to complete for 1500 records. If you try my approach, how many seconds does that take?

     


    N 56°04'39.16"
    E 12°55'05.25"

  • .6 seconds

    Ok so then I have used set based logic in the past just never knew that was what it was called. Thank you.

  • -- prepare sample data

    declare

    @tableone table (id int, address varchar(40))

    insert

    @tableone

    select

    12, '500 somewhare NY' union all

    select

    13, '500 somewhare NY' union all

    select

    14, '500 somewhare NY' union all

    select

    15, '505 somewhare NJ' union all

    select

    16, '505 somewhare NJ' union all

    select

    17, '506 newtown NH' union all

    select

    18, '506 newtown NH'

     

    declare

    @tabletwo table (id int, agentname varchar(40))

    insert

    @tabletwo

    select

    12, 'Jess' union all

    select

    13, null union all

    select

    14, 'Jess' union all

    select

    15, 'Dave' union all

    select

    16, 'Dave' union all

    select

    17, 'John' union all

    select

    18, null

    declare

    @tablethree table (id int, mlsno varchar(40))

    insert

    @tablethree

    select

    12, null union all

    select

    13, '12313' union all

    select

    14, '12313' union all

    select

    15, '12333' union all

    select

    16, '12333' union all

    select

    17, '11111' union all

    select

    18, null

    -- stage the data

    declare

    @stage table (recid int identity, ldocid int, address varchar(40), mlsno varchar(40), agentname varchar(40))

    insert

    @stage (ldocid, address, mlsno, agentname)

    select

    a.id,

    a

    .address,

    m

    .mlsno,

    ag

    .agentname

    from

    @tableone as a

    left

    join @tabletwo as ag on ag.id = a.id

    left

    join @tablethree as m on m.id = a.id

    order

    by a.address,

    m

    .mlsno desc,

    ag

    .agentname desc

    -- show the expected output

    select

    s.ldocid,

    s

    .address,

    s

    .mlsno,

    s

    .agentname

    from

    @stage as s

    inner

    join (

    select min(recid) as firstid

    from @stage

    group by address

    ) as d on d.firstid = s.recid

    order

    by s.recid

     

    ldocid address mlsno agentname

    14 500 somewhare NY 12313 Jess

    15 505 somewhare NJ 12333 Dave

    17 506 newtown NH 11111 John

     


    N 56°04'39.16"
    E 12°55'05.25"

  • Going with <<I need to get unique address that correspond with the most data>>, this will preferentially pick an ID/address line which joins to non-null MLSNO and agent...

     

    DROP TABLE #TableOne
    CREATE TABLE #TableOne (ID int, Address VARCHAR(20))
    INSERT INTO #TableOne (ID, Address)
    SELECT 12, '500 somewhare NY' UNION ALL
    SELECT 13, '500 somewhare NY' UNION ALL
    SELECT 14, '500 somewhare NY' UNION ALL
    SELECT 15, '505 somewhare NJ' UNION ALL
    SELECT 16, '505 somewhare NJ' UNION ALL
    SELECT 17, '506 newtown NH' UNION ALL
    SELECT 18, '506 newtown NH'
    --SELECT * FROM #TableOne
    DROP TABLE #TableTwo
    CREATE TABLE #TableTwo (ID int, AgentName VARCHAR(4))
    INSERT INTO #TableTwo (ID, AgentName)
    SELECT 12, 'Jess' UNION ALL
    SELECT 13, Null UNION ALL
    SELECT 14, 'Jess' UNION ALL
    SELECT 15, 'Dave' UNION ALL
    SELECT 16, 'Dave' UNION ALL
    SELECT 17, 'John' UNION ALL
    SELECT 18, Null
    --SELECT * FROM #TableTwo
    DROP TABLE #TableThree
    CREATE TABLE #TableThree (ID int, MLSNO int)
    INSERT INTO #TableThree (ID, MLSNO)
    SELECT 12, Null UNION ALL
    SELECT 13, 12313 UNION ALL
    SELECT 14, 12313 UNION ALL
    SELECT 15, 12333 UNION ALL
    SELECT 16, 12333 UNION ALL
    SELECT 17, 11111 UNION ALL
    SELECT 18, Null
    --SELECT * FROM #TableThree
    DROP TABLE #CalcTable
    SELECT ad.ID, ad.Address, mls.MLSNO, ag.AgentName,
     CASE WHEN mls.MLSNO IS NULL THEN 0 ELSE 1 END + 
     CASE WHEN ag.AgentName IS NULL THEN 0 ELSE 1 END AS ValueCount
    INTO #CalcTable
    FROM #TableOne ad
    INNER JOIN #TableTwo ag ON ag.ID = ad.ID
    INNER JOIN #TableThree mls ON mls.ID = ag.ID
    --SELECT * FROM #CalcTable
    SELECT ad.ID,
    ad.Address,
    mls.MLSNO,
    ag.AgentName
    FROM #TableOne ad
    LEFT JOIN #TableTwo ag ON ag.ID = ad.ID
    LEFT JOIN #TableThree mls ON mls.ID = ag.ID
    INNER JOIN (
     SELECT MAX(ID) AS ID
     FROM #CalcTable a 
     INNER JOIN (SELECT Address, MAX(ValueCount) AS MAXValueCount 
      FROM #CalcTable 	
     GROUP BY Address) b 
     ON b.Address = a.Address AND b.MAXValueCount = a.ValueCount
     GROUP BY a.Address 
    ) t ON t.ID = ad.ID
    ORDER BY ad.ID,
    mls.MLSNO DESC,
    ag.AgentName DESC
    

    Cheers

    ChrisM

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply