April 28, 2010 at 3:20 pm
Problem:
We have a database which holds address information in the following fields: AddressLine, City, State, ZIP. Because there is no additional field for 'Apartment', we need to parse that information from the AddressLine field. In many cases, the "Apt" is followed by '#', ':', '.', etc, which I would like to remove without having to use multiple nested REPLACE() functions.
Assumptions (for the purposes of this example):
1. Apartment information is always at the end of the Address Line
2. Apartment is always denoted by "Apt" and is always preceded by a space
Goal: To return only the apartment number in the Derived "Apartment" field
The SQL so far:
CREATE TABLE #Address
(
AddressLine VARCHAR(100)
)
INSERT INTO #Address
VALUES ('123 Main St Apt 4')
INSERT INTO #Address
VALUES ('123 Main St Apt #4')
INSERT INTO #Address
VALUES ('123 Main St Apt. 4')
INSERT INTO #Address
VALUES ('123 Main St Apt: 4')
INSERT INTO #Address
VALUES ('123 Main St')
SELECT AddressLine,
CASE
WHEN PATINDEX('% APT[^A-Z0-9]%', AddressLine) > 0 THEN SUBSTRING(AddressLine, 1, PATINDEX('% APT[^A-Z0-9]%', AddressLine) - 1)
ELSE AddressLine
END AS AddressLineWithoutApt,
CASE
WHEN PATINDEX('% APT[^A-Z0-9]%', AddressLine) > 0 THEN SUBSTRING(AddressLine, PATINDEX('% APT[^A-Z0-9]%', AddressLine), LEN(AddressLine))
END AS Apartment
FROM #Address
DROP TABLE #Address
Thanks for your help.
April 29, 2010 at 2:13 am
Because the AddressLine column always ends with the appartment number, this can be achieved using LEN. For example, the following code will return all appartment numbers up to 5 digits long:
CREATE TABLE #Address
(
AddressLine VARCHAR(100)
)
INSERT INTO #Address
VALUES ('123 Main St Apt 4')
INSERT INTO #Address
VALUES ('123 Main St Apt #4')
INSERT INTO #Address
VALUES ('123 Main St Apt. 4')
INSERT INTO #Address
VALUES ('123 Main St Apt: 4')
INSERT INTO #Address
VALUES ('123 Main St')
SELECT
CASE
WHEN SUBSTRING(AddressLine,LEN(AddressLine)-4,5) LIKE '[0-9][0-9][0-9][0-9][0-9]'
THEN RIGHT(AddressLine,5)
WHEN SUBSTRING(AddressLine,LEN(AddressLine)-3,4) LIKE '[0-9][0-9][0-9][0-9]'
THEN RIGHT(AddressLine,4)
WHEN SUBSTRING(AddressLine,LEN(AddressLine)-2,3) LIKE '[0-9][0-9][0-9]'
THEN RIGHT(AddressLine,3)
WHEN SUBSTRING(AddressLine,LEN(AddressLine)-1,2) LIKE '[0-9][0-9]'
THEN RIGHT(AddressLine,2)
WHEN SUBSTRING(AddressLine,LEN(AddressLine),1) LIKE '[0-9]'
THEN RIGHT(AddressLine,1)
END
FROM #Address
DROP TABLE #Address
April 29, 2010 at 6:54 am
April 29, 2010 at 9:04 am
Garadin (4/29/2010)
Why are you against using a function?
Good question. Not against using a function, but this particular server is a 2000 with a whole host of complicated dependencies already. We are hopefully transitioning it to 2K5 or 2K8 soon and the less we can put on the server to limit the complications, the better. That's kind of the idea anyway. If it could be done via a query, that would be nice for our particular situation.
April 29, 2010 at 9:36 am
How about a function that goes in a utility database. That way you can just recreate the utility database when you upgrade the server and don't have to worry about adding additional time to making sure all components are moved over. This is my preferred method of this, it is a function, but it's pretty general.
http://phelabaum.com/archive/2010/01/tally-table-string-cleaning/
April 29, 2010 at 10:16 am
Something like this?
-- sample data
CREATE TABLE #Address
(
AddressLine VARCHAR(100)
)
INSERT INTO #Address VALUES ('123 Main St Apt 4')
INSERT INTO #Address VALUES ('123 Main St Apt #4')
INSERT INTO #Address VALUES ('123 Main St Apt. 4809')
INSERT INTO #Address VALUES ('123 Main St Apt: 14')
INSERT INTO #Address VALUES ('123 Main St')
-- solution
SELECT a.AddressLine,
AddressCleaned = ISNULL(NULLIF(LEFT(a.AddressLine, CHARINDEX(' Apt', a.AddressLine)), ''), a.AddressLine),
ApartmentNumber = REVERSE(x.rebmuNtnemtrapA)
FROM #Address a
OUTER APPLY (
SELECT rebmuNtnemtrapA = (SELECT SUBSTRING(REVERSE(a.AddressLine), n.n, 1)
FROM (SELECT 1 AS [n] UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5) n
WHERE SUBSTRING(REVERSE(a.AddressLine), n.n, 1) LIKE '[0-9]'
FOR XML PATH(''))
) x
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
April 29, 2010 at 10:26 am
April 29, 2010 at 1:52 pm
After reading responses, this is the solution I've come up with so far:
--Initialize Test Data
CREATE TABLE #Address
(
Id INT,
AddressLine VARCHAR(100)
)
INSERT INTO #Address
VALUES (1,
'123 Main St # 4')
INSERT INTO #Address
VALUES (2,
'123 Main St Apt: 4')
INSERT INTO #Address
VALUES (3,
'123 Main St UNIT 4')
INSERT INTO #Address
VALUES (4,
'123 Main St Apt 4 ')
INSERT INTO #Address
VALUES (5,
'123 Main St Apt #4 ')
INSERT INTO #Address
VALUES (6,
'123 Main St Apt. #4-A ')
INSERT INTO #Address
VALUES (7,
'123 Main St BLDG: #4-A ')
INSERT INTO #Address
VALUES (8,
'123 Main St ')
--Solution
SELECT Id,
AddressLine,
CASE
WHEN PATINDEX('% APT[^A-Z0-9]%', AddressLine) > 0
THEN LEFT(AddressLine, PATINDEX('% APT[^A-Z0-9]%', AddressLine))
WHEN PATINDEX('% #[^A-Z0-9]%', AddressLine) > 0
THEN LEFT(AddressLine, PATINDEX('% #[^A-Z0-9]%', AddressLine))
WHEN PATINDEX('% UNIT[^A-Z0-9]%', AddressLine) > 0
THEN LEFT(AddressLine, PATINDEX('% UNIT[^A-Z0-9]%', AddressLine))
WHEN PATINDEX('% BLDG[^A-Z0-9]%', AddressLine) > 0
THEN LEFT(AddressLine, PATINDEX('% BLDG[^A-Z0-9]%', AddressLine))
ELSE AddressLine
END AS AddressWithoutApt,
REPLACE(CASE
--' APT' in Address Line
WHEN PATINDEX('% APT[^A-Z0-9]%', AddressLine) > 0
THEN LTRIM(RIGHT(RTRIM(AddressLine), LEN(AddressLine) - PATINDEX('% APT[^A-Z0-9]%', AddressLine) - 4))
--' #' in Address Line
WHEN PATINDEX('% #[^A-Z0-9]%', AddressLine) > 0
THEN LTRIM(RIGHT(RTRIM(AddressLine), LEN(AddressLine) - PATINDEX('% #[^A-Z0-9]%', AddressLine) - 2))
--' UNIT' in Address Line
WHEN PATINDEX('% UNIT[^A-Z0-9]%', AddressLine) > 0
THEN LTRIM(RIGHT(RTRIM(AddressLine), LEN(AddressLine) - PATINDEX('% UNIT[^A-Z0-9]%', AddressLine) - 5))
--' BLDG' in Address Line
WHEN PATINDEX('% BLDG[^A-Z0-9]%', AddressLine) > 0
THEN LTRIM(RIGHT(RTRIM(AddressLine), LEN(AddressLine) - PATINDEX('% BLDG[^A-Z0-9]%', AddressLine) - 5))
END, '#', '') AS AptNum
FROM #Address
--Drop Test Data
DROP TABLE #Address
This query considers other possible suffixes and is working pretty well, except in the case that there is an apartment number following an address, which has nothing preceding it. Many apartment numbers also include an alpha character or a dash, so this query takes some of these exceptions into account. I'm still working on cases where the number appears with no prefix, but this is the best I've come up with so far.
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply