LEFT JOIN

Question

Post reply

LEFT JOIN

Viewing 14 posts - 31 through 43 (of 43 total)

You must be logged in to reply to this topic. Login to reply

drew.allen SSC Guru Points: 76776 More actions · Answer 1

Welsh Corgi (1/14/2016)

I need to create a new table that has an Identity Column so that I can delete the duplicates.

No, you don't. You can use a CTE to delete the duplicates with ROW_NUMBER serving as a temporary identity.



WITH CTE AS (

	SELECT *, ROW_NUMBER() OVER(PARTITION BY ZipCode ORDER BY <your order criteria here>) AS rn

	FROM Postal_Codes_Import

)

DELETE CTE

WHERE rn > 1

Depending on your data you may be able to get by with just using a DISTINCT.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

LinksUp SSCertifiable Points: 6526 More actions · Answer 2

Welsh Corgi (1/14/2016)

I need to create a new table that has an Identity Column so that I can delete the duplicates.

Setting aside the fact that you don't need an Identity Column or a new table to identity dups, I am curious as to how having an Identity Column tells you what the dups are?

__________________________________________________________________________________________________________
How to Post to get the most: http://www.sqlservercentral.com/articles/Best+Practices/61537/

Welsh Corgi SSC Guru Points: 116520 More actions · Answer 3

LinksUp (1/14/2016)

Welsh Corgi (1/14/2016)

I need to create a new table that has an Identity Column so that I can delete the duplicates.

Setting aside the fact that you don't need an Identity Column or a new table to identity dups, I am curious as to how having an Identity Column tells you what the dups are?

Good point.

For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Welsh Corgi SSC Guru Points: 116520 More actions · Answer 4

Thanks Drew. I tried a SELECT Distinct and it did not work.

Thanks again.

For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Welsh Corgi SSC Guru Points: 116520 More actions · Answer 5

drew.allen (1/14/2016)

Welsh Corgi (1/14/2016)

Yes there are 3 duplicates when I do a INNER JOIN but I exclude them in the Insert and I keep getting PK Violations. :unsure:

An INNER JOIN won't give you the situation that I'm expecting. You're looking for situations where the code does not exist in the target table and there are duplicates in the source table. An inner join will exclude those, because the code doesn't exist in the target table.

Drew

Right. There are 3 records in the Postal that exist in the Import Table.

Thanks again.

For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Welsh Corgi SSC Guru Points: 116520 More actions · Answer 6

drew.allen (1/14/2016)

Welsh Corgi (1/14/2016)

I need to create a new table that has an Identity Column so that I can delete the duplicates.

No, you don't. You can use a CTE to delete the duplicates with ROW_NUMBER serving as a temporary identity.

WITH CTE AS ( SELECT *, ROW_NUMBER() OVER(PARTITION BY ZipCode ORDER BY <your order criteria here>) AS rn FROM Postal_Codes_Import ) DELETE CTE WHERE rn > 1

Depending on your data you may be able to get by with just using a DISTINCT.

Drew

What?

<your order criteria here>???

What does that matter?

Would that be the ZipCode?

For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Phil Parkin SSC Guru Points: 245813 More actions · Answer 7

Welsh Corgi (1/15/2016)

drew.allen (1/14/2016)

Welsh Corgi (1/14/2016)

I need to create a new table that has an Identity Column so that I can delete the duplicates.

No, you don't. You can use a CTE to delete the duplicates with ROW_NUMBER serving as a temporary identity.

WITH CTE AS ( SELECT *, ROW_NUMBER() OVER(PARTITION BY ZipCode ORDER BY <your order criteria here>) AS rn FROM Postal_Codes_Import ) DELETE CTE WHERE rn > 1

Depending on your data you may be able to get by with just using a DISTINCT.

Drew

What?

<your order criteria here>???

What does that matter?

Would that be the ZipCode?

Where you have several duplicates, how do you decide which one to keep and which to delete?

If you wanted to keep the one with the most recent create date, use ORDER BY CREATEDATE DESC, for example.

If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

Welsh Corgi SSC Guru Points: 116520 More actions · Answer 8

The following inserts 157 records:



WITH CTE AS (

	SELECT *, ROW_NUMBER() OVER(PARTITION BY ZipCode ORDER BY ZipCode) AS rn

	FROM Postal_Codes_Import

)

SELECT *

INTO #tmp

FROM  CTE

WHERE rn > 1

ORDER BY ZipCode

This returns 20 records:



SELECT ZipCode, COUNT(*) AS RecordNumber

FROM #tmp

GROUP BY ZipCode

HAVING COUNT (*) > 1

ZipCode RecordNumber

B5A 5A8 2

B5A 5G3 2

B5A 5G5 3

B5A 5J1 3

B5A 5J7 2

B5A 5K5 2

B5A 5L1 3

B5A 5L6 2

B5A 5L7 2

B5A 5N3 3

B5A 5P2 2

B5A 5R3 2

B5A 5X9 2

C0A 1H1 4

C0A 1H3 4

C0A 1H6 4

C0A 1H7 3

C0A 1H9 2

N0N 1J4 2

T4B 2T3 2

51

For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Welsh Corgi SSC Guru Points: 116520 More actions · Answer 9

Phil Parkin (1/15/2016)

Welsh Corgi (1/15/2016)

drew.allen (1/14/2016)

Welsh Corgi (1/14/2016)

I need to create a new table that has an Identity Column so that I can delete the duplicates.

No, you don't. You can use a CTE to delete the duplicates with ROW_NUMBER serving as a temporary identity.

WITH CTE AS ( SELECT *, ROW_NUMBER() OVER(PARTITION BY ZipCode ORDER BY <your order criteria here>) AS rn FROM Postal_Codes_Import ) DELETE CTE WHERE rn > 1

Depending on your data you may be able to get by with just using a DISTINCT.

Drew

What?

<your order criteria here>???

What does that matter?

Would that be the ZipCode?

Where you have several duplicates, how do you decide which one to keep and which to delete?

If you wanted to keep the one with the most recent create date, use ORDER BY CREATEDATE DESC, for example.

I do not have a DateTime Stamp in this table. I did not create it.

I'm just looking for duplicate Zip Code fields.

Thanks.

For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Phil Parkin SSC Guru Points: 245813 More actions · Answer 10

Welsh Corgi (1/15/2016)

Phil Parkin (1/15/2016)

Welsh Corgi (1/15/2016)

drew.allen (1/14/2016)

Welsh Corgi (1/14/2016)

I need to create a new table that has an Identity Column so that I can delete the duplicates.

No, you don't. You can use a CTE to delete the duplicates with ROW_NUMBER serving as a temporary identity.

WITH CTE AS ( SELECT *, ROW_NUMBER() OVER(PARTITION BY ZipCode ORDER BY <your order criteria here>) AS rn FROM Postal_Codes_Import ) DELETE CTE WHERE rn > 1

Depending on your data you may be able to get by with just using a DISTINCT.

Drew

What?

<your order criteria here>???

What does that matter?

Would that be the ZipCode?

Where you have several duplicates, how do you decide which one to keep and which to delete?

If you wanted to keep the one with the most recent create date, use ORDER BY CREATEDATE DESC, for example.

I do not have a DateTime Stamp in this table. I did not create it.

I'm just looking for duplicate Zip Code fields.

Thanks.

The words "for example" in my post were there to indicate that I was providing an example.

If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

J Livingston SQL SSC Guru Points: 51550 More actions · Answer 11

maybe maybe not....would be a whole lot easier with sample data.....:doze:



WITH cte

AS

(SELECT zipcode,

        ROW_NUMBER() OVER(PARTITION BY ZipCode ORDER BY ZipCode) AS rn

FROM Postal_Codes_Import

)

INSERT INTO Postal

       SELECT cte.zipcode   -- add additional columns as required

       FROM cte

            LEFT OUTER JOIN Postal AS P ON cte.zipcode = P.ID

       WHERE(cte.rn = 1)

            AND (P.ID IS NULL);

SELECT *

FROM postal;

________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day

MMartin1 One Orange Chip Points: 27655 More actions · Answer 12

Phil Parkin (1/15/2016)

Welsh Corgi (1/15/2016)

Phil Parkin (1/15/2016)

Welsh Corgi (1/15/2016)

drew.allen (1/14/2016)

Welsh Corgi (1/14/2016)

I need to create a new table that has an Identity Column so that I can delete the duplicates.

No, you don't. You can use a CTE to delete the duplicates with ROW_NUMBER serving as a temporary identity.

WITH CTE AS ( SELECT *, ROW_NUMBER() OVER(PARTITION BY ZipCode ORDER BY <your order criteria here>) AS rn FROM Postal_Codes_Import ) DELETE CTE WHERE rn > 1

Depending on your data you may be able to get by with just using a DISTINCT.

Drew

What?

<your order criteria here>???

What does that matter?

Would that be the ZipCode?

Where you have several duplicates, how do you decide which one to keep and which to delete?

If you wanted to keep the one with the most recent create date, use ORDER BY CREATEDATE DESC, for example.

I do not have a DateTime Stamp in this table. I did not create it.

I'm just looking for duplicate Zip Code fields.

Thanks.

The words "for example" in my post were there to indicate that I was providing an example.

If both of the rows have identical values in all the columns and you know this for a fact , it does not matter what you order by.

Else you need to see if there are any indications in other columns that would want you to choose to select that one over the other(s). Phil's example uses a create date in DESC order, that makes sense in that you probably want the latest version of the row.

Without more insight into the data , I cannot provide further guidance, but I am sure you can make this judgement.

Dont forget to go back to the file originators and provide the feed back that there are duplicates when (I assume) there should not be. The best solution really is to stop the import if duplicates are encountered in the import file and it remains unclear why and what version of the row is the correct one. Here you want a process to fail. If I am importing this using SSIS I would want a variable to hold duplicate counts in the import and use this as a constraint to move to the next task (dupValues = 0) .

----------------------------------------------------

Ed Wagner SSC Guru Points: 287015 More actions · Answer 13

Welsh Corgi (1/15/2016)

Phil Parkin (1/15/2016)

Welsh Corgi (1/15/2016)

drew.allen (1/14/2016)

Welsh Corgi (1/14/2016)

I need to create a new table that has an Identity Column so that I can delete the duplicates.

No, you don't. You can use a CTE to delete the duplicates with ROW_NUMBER serving as a temporary identity.

WITH CTE AS ( SELECT *, ROW_NUMBER() OVER(PARTITION BY ZipCode ORDER BY <your order criteria here>) AS rn FROM Postal_Codes_Import ) DELETE CTE WHERE rn > 1

Depending on your data you may be able to get by with just using a DISTINCT.

Drew

What?

<your order criteria here>???

What does that matter?

Would that be the ZipCode?

Where you have several duplicates, how do you decide which one to keep and which to delete?

If you wanted to keep the one with the most recent create date, use ORDER BY CREATEDATE DESC, for example.

I do not have a DateTime Stamp in this table. I did not create it.

I'm just looking for duplicate Zip Code fields.

Thanks.

Maybe you should read up on ROW_NUMBER used with PARTITION BY and understand how it works before you try to use it. Take the time to learn it so you'll be able to use it here and in other scenarios. It's a powerful tool to have in your toolbox, but you will need to understand it to be able to use it. If you don't understand it, don't put it in production until you do.

When you identify duplicate zip codes, you need to decide which one to keep. It case be based on whatever column you want, and then you sort by that column. If you decide that it doesn't matter, at least you've made a decision, but you have to make a decision one way.

Having an identity column will not identify duplicates in a different column. You should understand this already.

Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions

Welsh Corgi SSC Guru Points: 116520 More actions · Answer 14

There was not a problem with the LEFT JOIN syntax.

There were duplicates in the Source file which I should have caught. :blush:

Thanks for all that provided help.:-)

For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/