To cursor, or not?

  • I have a simple holding table thats populated with filenames from a directory and I'm wanting to clean up the names so I can then undertake a data comparison with another table.

    Each filename will either contain 4 numerics '####', 5 numerics '#####' or 'CYP####' and the words 'Floor Plan'. I'm wanting to remove each of these to result in just the property name.

    In the example given,

    'Abbey Lane Cemetery 03022 Floor Plan' would become 'Abbey Lane Cemetery',

    'Abbey Lane Primary 4-11 2001 Floor Plan' would become 'Abbey Lane Primary 4-11',

    'Abbeydale Grange Secondary 11-16 4254 Floor Plan' would become 'Abbeydale Grange Secondary 11-16',

    and

    'Aldine House CYP0048 Floor Plan' would become 'Aldine House'

    and so on.

    USE test;

    go

    ------

    IF OBJECT_ID('CADlist','U') IS NOT NULL

    DROP TABLE CADlist

    ------

    CREATE TABLE [dbo].[CADlist](

    [FileID] [int] IDENTITY(1,1) NOT NULL,

    [FileNM] [nvarchar](1000) NOT NULL,

    PRIMARY KEY CLUSTERED

    (

    [FileID] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    -------

    SET IDENTITY_INSERT CADlist ON

    ------

    INSERT INTO CADlist

    (FileID, FileNM)

    SELECT '1','Abbey Lane Cemetery 03022 Floor Plan' UNION ALL

    SELECT '2','Abbey Lane Primary 4-11 2001 Floor Plan' UNION ALL

    SELECT '3','Abbeydale Grange Secondary 11-16 4254 Floor Plan' UNION ALL

    SELECT '4','Abbeydale Industrial Hamlet 00433 Floor Plan' UNION ALL

    SELECT '5','Abbeydale Primary 3-11 2002 Floor Plan' UNION ALL

    SELECT '6','Abbeyfield Park 00460 Floor Plan' UNION ALL

    SELECT '7','Acres Hill Primary 3-11 2318 Floor Plan' UNION ALL

    SELECT '8','Aldine House CYP0048 Floor Plan' UNION ALL

    SELECT '9','All Saints C Secondary VA 11-18 5401 Floor Plan' UNION ALL

    SELECT '10','All Saints Youth Club CYP0002 Floor Plan' UNION ALL

    SELECT '11','Angram Bank Pavilion 03029 Floor Plan' UNION ALL

    SELECT '12','Angram Bank Primary 3-11 2342 Floor Plan' UNION ALL

    SELECT '13','Anns Grove Primary 3-11 2343 Floor Plan' UNION ALL

    SELECT '14','Anns Road Youth Club CYP0003 Floor Plan' UNION ALL

    SELECT '15','Arbourthorne Primary 3-11 3429 Floor Plan' UNION ALL

    SELECT '16','Athelstan Primary 4-11 2340 Floor Plan' UNION ALL

    SELECT '17','Audio Visual Enterprise Centre (Brown Str) CYP0075 Floor Plan' UNION ALL

    SELECT '18','Audio Visual Enterprise Centre (Sidney Str) CYP0076 Floor Plan' UNION ALL

    SELECT '19','Ballifield Primary 3-11 2281 Floor Plan' UNION ALL

    SELECT '20','Bankwood Primary 4-11 2322 Floor Plan'

    SET IDENTITY_INSERT CADlist OFF

    I'm not sure how best to go about it. To remove the 'Floor Plan' part it's simply a case of:

    UPDATE CADlist

    SET FileNM = Replace(FileNM, ' Floor Plan', '')

    Is it worth splitting the data into it's various types and then working on each set individually:

    SELECT FileNM

    FROM CADlist

    WHERE patindex('% [0-9][0-9][0-9][0-9][0-9] %', FileNM) > 0

    ----

    SELECT FileNM

    FROM CADlist

    WHERE patindex('% [0-9][0-9][0-9][0-9] %', FileNM) > 0

    ----

    SELECT FileNM

    FROM CADlist

    WHERE patindex('% CYP[0-9][0-9][0-9][0-9] %', FileNM) > 0

    But, I'm not sure how best to alter each record, do I need to creat a cursor and iterate through each record?

    I can use PATINDEX and STUFF to update strings but not sure how to use this for each record

    DECLARE @strTest varchar(8000)

    DECLARE @pos1 int --find position of 5 numerics

    SELECT @strTest = 'Abbey Lane Cemetery 03022 Floor Plan'

    SELECT @strTest as [BEFORE]

    -----

    SET @pos1 =PATINDEX('% [0-9][0-9][0-9][0-9][0-9] %', @strTest ) --find position of 5 numerics

    SET @strTest= STUFF(@strTest,@pos1,6,'') --set string null for 5 numerics and blank space

    SELECT @strTest as [AFTER]

    I'd really appreciate some pointers in the right direction!

    many thanks,

    Dom Horton

  • There is no need to do a cursor. All you have to do is specify the operation(s) you want to perform to the entire set of rows. For example:

    -- removes 'Floor Plan' from all rows

    update cadlist

    set fileNm = replace(fileNm,'Floor Plan','')

    To perform multiple operations, you nest the functions or expression that are doing the string manipulation.

    update cadlist

    set fileNm = replace(replace(fileNm,'Floor Plan',''),'CYP','')

    Looking at the pattern of your data, after using the above update, all your rows should end with a string of 4-5 digits. Instead of doing multiple passes with different WHERE conditions, you can use nested CASE expressions with your PATINDEX searches, or perhaps you could REVERSE() the string and look for the first blank then use LEFT, RIGHT, or SUBSTRING to get only the characters you want to keep.

    It helps if you can generalize your observations, seeing the pattern as a string of 4 or more digits or seeing that they will always fall at the end. That should be enough to get you started. Let us know if you have any questions.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Thanks for the response

    ...I'll have a play...

  • Hi

    Is that...

    space - [word] - space - "floor" - space - "plan"

    where [word] = 4 numerics '####', 5 numerics '#####' or 'CYP####'

    Or to put it another way, you want to truncate the string at the FIRST space of

    space - [word] - space - "floor" - space - "plan"

    is this correct?

    Reading from RIGHT to LEFT, the third space in the string?

    If so, then here's how:

    RTRIM the string

    REVERSE it

    Find the position of the third space using CHARINDEX repeatedly

    Pluck the piece you want.

    Would you like some help with this?

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • SELECT FileNM FROM CADlist

    -----

    UPDATE CADlist

    SET FileNm = REPLACE(REPLACE(fileNm,'Floor Plan',''),'CYP','')

    -----

    SELECT FileNM FROM CADlist

    -----

    UPDATE CADlist

    SET FileNM = REVERSE(FileNM)

    SELECT FileNM FROM CADlist

    -----

    If I remove 'CYP' and 'Floor Plan' as suggested, reverse the string so each entry will start with either 4 or 5 digits. How do I apply the below logic to all the records:

    DECLARE @strTest varchar(1000)

    SELECT @strTest = '22030 yretemeC enaL yebbA'

    SELECT @strtest

    SELECT @strTest=SUBSTRING ( @strtest ,(charindex(' ',@strtest)),1000 )

    SELECT @strTest=reverse(@strtest)

    SELECT @strtest

    thanks

  • Here you go...

    -- This is a crude numbers table and is not part of the solution

    ;WITH Numbers AS (SELECT (N10 + N1) + 1 AS n

    FROM (SELECT CAST(0 AS INT) AS N10 UNION ALL SELECT 10 UNION ALL SELECT 20 UNION ALL SELECT 30 UNION ALL SELECT 40 UNION ALL

    SELECT 50 UNION ALL SELECT 60) Tens

    CROSS JOIN

    (SELECT CAST(0 AS INT) AS N1 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL

    SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) Units )

    SELECT c.*, d.[Output]

    FROM #CADlist c

    CROSS APPLY (SELECT LEFT(c.FileNM, LEN(c.FileNM) - MAX(x.n)) AS [Output]

    FROM (SELECT TOP 3 n

    FROM Numbers n

    WHERE SUBSTRING(LTRIM(REVERSE(c.FileNM)), n, 1) = ' '

    AND n < LEN(c.FileNM)

    ORDER BY n) x

    ) d

    You will need to change the table name.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • I'm sorry Chris but I don't follow how your code works.

    regards,

    Dom

  • DECLARE @FileNM VARCHAR(70)

    SET @FileNM = 'Audio Visual Enterprise Centre (Sidney Str) CYP0076 Floor Plan'

    ;WITH Numbers AS (SELECT (N10 + N1) + 1 AS n

    FROM (SELECT CAST(0 AS INT) AS N10 UNION ALL SELECT 10 UNION ALL SELECT 20 UNION ALL SELECT 30 UNION ALL SELECT 40 UNION ALL

    SELECT 50 UNION ALL SELECT 60) Tens

    CROSS JOIN

    (SELECT CAST(0 AS INT) AS N1 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL

    SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) Units )

    -- look at the result of this, then

    SELECT n, SUBSTRING(LTRIM(REVERSE(@FileNM)), n, 1)

    FROM Numbers n

    ORDER BY n

    -- comment it out and look at the result of this

    SELECT TOP(3) n

    FROM Numbers n

    WHERE SUBSTRING(LTRIM(REVERSE(@FileNM)), n, 1) = ' '

    AND n < LEN(@FileNM)

    ORDER BY n

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Hi Chris,

    thanks for the help, I've fathomed it out now! I'm not that familar with CTE's, so it'll be something I'll read up on in the next day or two.

    regards,

    Dom

  • Top work, Dom.

    A CTE is like a derived table defined outside the main body of the query, with the advantage that you can refer to it multiple times in the query, including as an aggregated derived table. They're probably best for making complex queries more readable and testable but have some specialised uses too - resolving hierarchical structures springs to mind. There's a recent[/url] SSC article to get you started.

    Cheers

    ChrisM

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Thanks for the info Chris, it's much appreciated,

    cheers,

    dom

  • Here's a short article on CTE's you could read when you want to start investigating them.

    http://qa.sqlservercentral.com/articles/T-SQL/68651/

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply