Remove String from column Post Code Field

  • Hi All,

    I hope that you can help.

    I am querying with a SELECT statement against an address table that has a post code column with corrupt data.

    Sample record:- Northants NN4 0NB

    Should be NN4 0NB

    I have used the following on another column:-

    LEFT(A.Addr1,CASE WHEN CHARINDEX ('(', A.Addr1) =0 THEN LEN(A.Addr1) ELSE CHARINDEX('(' ,A.Addr1) -1 END) AS 'Address 1'

    but unable to correct this problem?

    Thanks in advance

  • Is it always a single word, followed by a space, followed by the correct data?

    Or, is it always two 3-character strings with a space at the end with the illegal data in front of that?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Hi,

    Most of the records in this column are correct i.e. NN4 0NB (two strings separated with a space), Then occasionally I return corrupt records i.e. Derbyshire S42 5BX (three strings separated with spaces). The latter is incorrect i.e. I would only want to SELECT S42 5BX (with no spaces before the start of the string)

    Hope that my explanation is sufficient.

    Regards

  • can you make an assumption based on the length? your example using charindex of the left parenthesis doesn't match your sample you posted.

    ;WITH MyCTE([PostalCode])

    AS

    (

    SELECT 'Derbyshire S42 5BX' UNION ALL

    SELECT 'NN4 0NB' UNION ALL

    SELECT 'S42 5BX' UNION ALL

    SELECT 'Northants NN4 0NB'

    )

    SELECT RIGHT([PostalCode],7) As PossibleValue, * FROM MyCTE WHERE LEN([PostalCode]) > 7

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Pack_Star (9/8/2015)


    Hi,

    Most of the records in this column are correct i.e. NN4 0NB (two strings separated with a space), Then occasionally I return corrupt records i.e. Derbyshire S42 5BX (three strings separated with spaces). The latter is incorrect i.e. I would only want to SELECT S42 5BX (with no spaces before the start of the string)

    Hope that my explanation is sufficient.

    Regards

    Are you sure your incorrect data is always going to be three strings? If the county is West Yorkshire, it could be four strings and if it's Tyne And Wear then that's five strings.


    On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
    —Charles Babbage, Passages from the Life of a Philosopher

    How to post a question to get the most help http://www.sqlservercentral.com/articles/Best+Practices/61537

  • Lowell (9/8/2015)


    can you make an assumption based on the length? your example using charindex of the left parenthesis doesn't match your sample you posted.

    ;WITH MyCTE([PostalCode])

    AS

    (

    SELECT 'Derbyshire S42 5BX' UNION ALL

    SELECT 'NN4 0NB' UNION ALL

    SELECT 'S42 5BX' UNION ALL

    SELECT 'Northants NN4 0NB'

    )

    SELECT RIGHT([PostalCode],7) As PossibleValue, * FROM MyCTE WHERE LEN([PostalCode]) > 7

    A correctly formed postcode can be 7 or 8 characters - eg SW1A 4WW so change RIGHT(PostalCode, 7) to RIGHT(PostalCode, 8) and LTRIM the result

    <Edit for poor wording>

    I'm a DBA.
    I'm not paid to solve problems. I'm paid to prevent them.

  • A correctly formed postcode can be 7 or 8 characters - eg SW1A 4WW so change RIGHT(PostalCode, 7) to RIGHT(PostalCode, 8) and LTRIM the result

    Postcodes can be 6,7 or 8 characters, M1 2DF is also a valid format.


    On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
    —Charles Babbage, Passages from the Life of a Philosopher

    How to post a question to get the most help http://www.sqlservercentral.com/articles/Best+Practices/61537

  • Hi All,

    I believe that this could be possibly be resolved with a Case Statement?

    The select needs to do the following. If there are three separate strings in the column then select from the first character of the second string else select the full column. Please note that each of the strings in the column can be of variable lengths.

    Hope that this make sense.

    Kind Regards

  • Pack_Star (9/8/2015)


    Hi All,

    I believe that this could be possibly be resolved with a Case Statement?

    The select needs to do the following. If there are three separate strings in the column then select from the first character of the second string else select the full column. Please note that each of the strings in the column can be of variable lengths.

    Hope that this make sense.

    Kind Regards

    This is the approach I would take, but I'm gonna word it differently.

    If there's only one space character (possibly after trimming) use the whole value, otherwise, take the characters after the 2nd to last space to the end of the string.



    Alvin Ramard
    Memphis PASS Chapter[/url]

    All my SSC forum answers come with a money back guarantee. If you didn't like the answer then I'll gladly refund what you paid for it.

    For best practices on asking questions, please read the following article: Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • SELECT

    ltrim(right(postalcode,charindex(' ',reverse(postalcode),PATINDEX('% %',reverse(postalcode))+1)))

    FROM yourdata

    I've tested this over 14500 county+[space]+postcode combinations and the only ones it didn't work on were cases where there was no space in the postcode on our data.

    Edited to simplify and clarify


    On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
    —Charles Babbage, Passages from the Life of a Philosopher

    How to post a question to get the most help http://www.sqlservercentral.com/articles/Best+Practices/61537

  • If it's an option for you, concentrate on cleaning the data. You're always going to find edge cases that you hadn't accounted for - maybe the two parts of the postcode separated by two spaces or by no spaces, or perhaps the second part is missing altogether. Once you've cleaned the data, make sure that input is validated. It may be worth investing in an address database from Royal Mail. If you can't do that, regular expressions[/url] might be a good way to validate or verify data in the column.

    John

  • BWFC (9/8/2015)


    SELECT

    ltrim(right(postalcode,charindex(' ',reverse(postalcode),PATINDEX('% %',reverse(postalcode))+1)))

    FROM yourdata

    I've tested this over 14500 county+[space]+postcode combinations and the only ones it didn't work on were cases where there was no space in the postcode on our data.

    Edited to simplify and clarify

    If you're dealing with multiple countries, you might have to include the country in the CASE statement, for example:

    CASE

    WHEN COUNTRY IN ('Country1', 'Country2', ...) AND PostalCode ..... THEN ...

    WHEN COUNTRY IN ('Country10', 'Country1', ...) AND PostalCode ..... THEN ...

    ELSE ... END



    Alvin Ramard
    Memphis PASS Chapter[/url]

    All my SSC forum answers come with a money back guarantee. If you didn't like the answer then I'll gladly refund what you paid for it.

    For best practices on asking questions, please read the following article: Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • John Mitchell-245523 (9/8/2015)


    If it's an option for you, concentrate on cleaning the data. You're always going to find edge cases that you hadn't accounted for - maybe the two parts of the postcode separated by two spaces or by no spaces, or perhaps the second part is missing altogether. Once you've cleaned the data, make sure that input is validated. It may be worth investing in an address database from Royal Mail. If you can't do that, regular expressions[/url] might be a good way to validate or verify data in the column.

    John

    Regular Expressions in TSQL? I'll be happy when Microsoft makes that possible without using CLR.



    Alvin Ramard
    Memphis PASS Chapter[/url]

    All my SSC forum answers come with a money back guarantee. If you didn't like the answer then I'll gladly refund what you paid for it.

    For best practices on asking questions, please read the following article: Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • Hi All,

    For now I am unable to "clean" any of these records as doing so will cause problems elsewhere.

    Thanks for the suggestions. They assisted me in finding a solution:-

    CASE WHEN postalcode NOT LIKE '_[a-Z]_[a-Z]%'

    THEN LTRIM(postalcode)

    ELSE RIGHT(RTRIM(postalcode),8) END AS 'Corrected Postcode'

    For future reference for others who come across this issue, please see the following:-

    https://www.mrs.org.uk/pdf/postcodeformat.pdf

    Kind regards

  • I don't know what your data looks like but that will only work for an eight character postcode. If you take the rightmost eight characters of your string you will take the last character of the preceding string followed by a space if you have a six character postcode and the preceding space if you have a seven character postcode.

    with example(postalcode) as

    (

    select 'Hertfordshire AL1 3JW' union all

    select 'West Midlands B4 6BJ' union all

    select 'Somerset BS15 4PP'

    )

    select

    CASE WHEN postalcode NOT LIKE '_[a-Z]_[a-Z]%'

    THEN LTRIM(postalcode)

    ELSE RIGHT(RTRIM(postalcode),8) END AS 'Corrected Postcode'

    from example


    On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
    —Charles Babbage, Passages from the Life of a Philosopher

    How to post a question to get the most help http://www.sqlservercentral.com/articles/Best+Practices/61537

Viewing 15 posts - 1 through 15 (of 25 total)

You must be logged in to reply to this topic. Login to reply