Slow Stored Procedure

  • Hello all,

    With some previous help from the group I have arrived at a SP that is doing what I want but very slowly. I would appreciate any advice on how to speed this up. Here is the situation:

    1. I have a mailing list called AbcsMstr. The user provides a code called @abcs_cd to pull a subset of names from AbcsMstr and a cursor is opened with the result.

    2. Next I look for matches in our prospect table where state = state, city = city and SOUNDEX matches for first and last name. The matches are inserted into table abcs.

    3. Typically, the AbcsMstr subset will have about 10,000 records and our prospect table has about 200,000 records. This SP code is below.

    4. Is there anything I can do with the logic to speed things up? It was suggested to open the cursor as FORWARD ONLY but I can't seem to find anything about it in BOL. Is this supported in SQL7?

    Thanks very much for any suggestions!!!

    CREATE PROCEDURE procSOUND_EX @abcs_cd char(10)

    --Jonathan Matt 6/26/02

    --Input Paramters: abcs_cd - example: Code=30-2

    --Purpose: User supplies the abcs_cd code to populate a cursor, then each record in the cursor

    --is tested for a match in the Person table. The results are placed in a table named abcs.

    AS

    set nocount on

    DECLARE @AbcsId int,

    @last_name varchar(50),

    @first_name varchar(50),

    @city varchar(50),

    @st varchar(50),

    @zip varchar(50),

    @prspct_id int

    DELETE FROM abcs

    DECLARE name_cursor CURSOR FOR

    SELECT AbcsId, Last_Name, First_Name, City, St, Zip

    FROM AbcsMstr

    WHERE AbcsMstr.COUP = @abcs_cd

    OPEN name_cursor

    WHILE @@fetch_status <> -1

    BEGIN

    FETCH name_cursor INTO @AbcsId, @last_name, @first_name, @city, @st, @zip

    INSERT INTO abcs

    SELECT @AbcsId, Person.person_last_name, Person.person_first_name, Person.prspct_id, Prspct.hsehold_addr1_name, Prspct.hsehold_city_name, Prspct.hsehold_state_cd, Prspct.hsehold_postal_zone_cd

    FROM Person JOIN Prspct ON Person.prspct_id = Prspct.prspct_id

    WHERE

    @st = Prspct.hsehold_state_cd AND

    @city = Prspct.hsehold_city_name AND

    (SOUNDEX(person_last_name) = SOUNDEX (@last_name)) AND

    (SOUNDEX(person_first_name) = SOUNDEX (@first_name))

    END

    CLOSE name_cursor

    DEALLOCATE name_cursor

  • Hi, what you could do is move the insert statement in a separate stored procedure. And make the cursor read only ( i don't have sql 7 bol at hand so i don't know the exact syntacts for a read only cursor)

  • Definately upgrade to SQL Server 2000. Don't wait, do it immediately.

    Several features will make your routine run instantly. You would be able to have a computed column in the table for the SOUNDEX fuction. Then create an index on this computed column.

    Take care.

  • I agree that an indexed computed column is a good idea. Even without it though the key is to try to do a set operation instead of a cursor operation. Have you tried writing it as a single query? If performance suffers too much (I dont imagine this runs that often or is extremely time critical?) you could add two columns for the soundex values your table and maintain them with a trigger index them. This would simulate what you'd get from a computed column with an index on it.

    I'd look at those options before tinkering with the cursor.

    Andy

    http://qa.sqlservercentral.com/columnists/awarren/

  • I want to thank everyone who offered suggestions. I rewrote the procedure and got rid of the cursor. The cursor method took many hours (I never let it complete). It now runs in under 5 minutes for 10,000 input records against 250,000 prospect records! I am happy with this! My code is below. Any suggestions for further improvement gladly accepted.

    Thanks again to all.

    CREATE PROCEDURE procAbcs_hit @abcs_cd varchar (20)

    /*Created by Jonathan Matt on 7/2/02.

    Input parameter @abcs_cd identifies the particular subset of names to analyze.

    Table abcs_hit holds all prospects that have same ciy and state as names being analyzed.

    Table abcs holds all those prospects in abcs_hit whose SOUNDEX matches for first and last names*/

    AS

    SET NOCOUNT ON

    DELETE FROM abcs_hit

    INSERT abcs_hit

    SELECT AbcsMstr.AbcsId,

    Prspct.prspct_id,

    AbcsMstr.Last_Name,

    AbcsMstr.First_Name,

    Person.person_last_name,

    Person.person_first_name

    FROM (AbcsMstr INNER JOIN Prspct ON (AbcsMstr.St = Prspct.hsehold_state_cd) AND (AbcsMstr.City = Prspct.hsehold_city_name)

    INNER JOIN Person ON Prspct.prspct_id = Person.prspct_id)

    WHERE (((AbcsMstr.Coup)='@abcs_cd') AND ((Prspct.prspct_type_cd)='HOUSEHOLD'))

    ORDER BY AbcsMstr.AbcsId, Prspct.prspct_id;

    DELETE FROM abcs

    INSERT abcs

    SELECT abcs_id,

    prspct_id

    FROM abcs_hit

    WHERE (SOUNDEX(ab_last_name)= SOUNDEX(per_last_name)) AND

    (SOUNDEX(ab_first_name)=SOUNDEX(per_first_name))

    ORDER BY abcs_id, prspct_id;

  • I have a similar process but was disatisfied with soundex for phonetic matching, so here is an alternate algorithm:

    CREATE FUNCTION dbo.DoubleMetaPhone (@str varchar(70))

    RETURNS char(10)

    AS

    BEGIN

    /*#########################################################################

    Double Metaphone Phonetic Matching Function

    This reduces word to approximate phonetic string. This is deliberately

    not a direct phonetic

    Based off original C++ code and algorithm by

    Lawrence Philips (lphilips@verity.com)

    Published in the C/C++ Users Journal:

    http://www.cuj.com/articles/2000/0006/0006d/0006d.htm?topic=articles

    Original Metaphone presented in article in "Computer Language" in 1990.

    Reduces alphabet to

    The 14 constonant sounds:

    "sh" "p"or"b" "th"

    | | |

    X S K J T F H L M N P R 0 W

    Drop vowels except at the beginning

    Produces a char(10) string. The left(@result,5) gives the most common

    pronouciation, right(@result,5) gives the commonest alternate.

    Translated into t-SQL by Keith Henry (keithh@lbm-solutions.com)

    #########################################################################*/

    Declare @original varchar(70),

    @primary varchar(70),

    @secondary varchar(70),

    @length int,

    @Last int,

    @current int,

    @strcur1 char(1) ,

    @strnext1 char(1) ,

    @strprev1 char(1),

    @SlavoGermanic bit

    set @SlavoGermanic = 0

    set @primary = ''

    set @secondary = ''

    set @current = 1

    set @length = len(@str)

    set @Last = @length

    set @original = lTrim(isnull(@str,'')) + ' '

    set @original = upper(@original)

    if patindex('%[WK]%',@str) + charindex('CZ',@str) + charindex('WITZ',@str) <> 0

    set @SlavoGermanic = 1

    -- skip this at beginning of word

    if substring(@original, 1, 2) in ('GN', 'KN', 'PN', 'WR', 'PS')

    set @current = @current + 1

    -- Initial 'X' is pronounced 'Z' e.g. 'Xavier'

    if substring(@original, 1, 1) = 'X'

    begin

    set @primary = @primary + 'S' -- 'Z' maps to 'S'

    set @secondary = @secondary + 'S'

    set @current = @current + 1

    end

    if substring(@original, 1, 1) in ('A', 'E', 'I', 'O', 'U', 'Y')

    begin

    set @primary = @primary + 'A' -- all init vowels now map to 'A'

    set @secondary = @secondary + 'A'

    set @current = @current + 1

    end

    while @current <= @length

    begin

    if len(@primary) >= 5 break

    set @strcur1 = substring(@original, @current, 1)

    set @strnext1 = substring(@original, (@current + 1), 1)

    set @strprev1 = substring(@original, (@current - 1), 1)

    if @strcur1 in ('A', 'E', 'I', 'O', 'U', 'Y', ' ', '''', '-')

    set @current = @current + 1

    else

    if @strcur1 = 'B' -- '-mb', e.g. 'dumb', already skipped over ...

    begin

    set @primary = @primary + 'P'

    set @secondary = @secondary + 'P'

    if @strnext1 = 'B'

    set @current = @current + 2

    else

    set @current = @current + 1

    end

    else

    if @strcur1 = 'Ç'

    begin

    set @primary = @primary + 'S'

    set @secondary = @secondary + 'S'

    set @current = @current + 1

    end

    else

    if @strcur1 = 'C'

    begin

    if @strnext1 = 'H'

    begin

    if substring(@original, @current, 4) = 'CHIA' -- italian 'chianti'

    begin

    set @primary = @primary + 'K'

    set @secondary = @secondary + 'K'

    end

    else

    begin

    if @current > 1 -- find 'michael'

    and substring(@original, @current, 4) = 'CHAE'

    begin

    set @primary = @primary + 'K'

    set @secondary = @secondary + 'X'

    end

    else

    begin

    if @current = 1 -- greek roots e.g. 'chemistry', 'chorus'

    and (substring(@original, @current + 1, 5) in ('HARAC', 'HARIS')

    or substring(@original, @current + 1, 3) in ('HOR', 'HYM', 'HIA', 'HEM')

    )

    and substring(@original, 1, 5) <> 'CHORE'

    begin

    set @primary = @primary + 'K'

    set @secondary = @secondary + 'K'

    end

    else

    begin

    if ( substring(@original, 0, 4) in ('VAN ', 'VON ') -- germanic, greek, or otherwise 'ch' for 'kh' sound

    or substring(@original, 0, 3) = 'SCH'

    )

    or substring(@original, @current - 2, 6) in ('ORCHES', 'ARCHIT', 'ORCHID') -- 'architect' but not 'arch', orchestra', 'orchid'

    or substring(@original, @current + 2, 1) in ('T', 'S')

    or ( ( @strprev1 in ('A','O','U','E')

    or @current = 1

    )

    and substring(@original, @current + 2, 1) in ('L','R','N','M','B','H','F','V','W',' ') -- e.g. 'wachtler', 'weschsler', but not 'tichner'

    )

    begin

    set @primary = @primary + 'K'

    set @secondary = @secondary + 'K'

    end

    else

    begin

    if (@current > 1)

    begin

    if substring(@original, 1, 2) = 'MC' -- e.g. 'McHugh'

    begin

    set @primary = @primary + 'K'

    set @secondary = @secondary + 'K'

    end

    else

    begin

    set @primary = @primary + 'X'

    set @secondary = @secondary + 'K'

    end

    end

    else

    begin

    set @primary = @primary + 'X'

    set @secondary = @secondary + 'X'

    end

    end

    end

    end

    end

    set @current = @current + 2

    end --ch logic

    else

    begin

    if @strnext1 = 'C' -- double 'C', but not McClellan'

    and not(@current = 1

    and substring(@original, 1, 1) = 'M'

    )

    begin

    if substring(@original, @current + 2, 1) in ('I','E','H') -- 'bellocchio' but not 'bacchus'

    and substring(@original, @current + 2, 2) <> 'HU'

    begin

    if ( @current = 2 -- 'accident', 'accede', 'succeed'

    and @strprev1 = 'A'

    )

    or substring(@original, @current - 1, 5) in ('UCCEE', 'UCCES')

    begin

    set @primary = @primary + 'KS'

    set @secondary = @secondary + 'KS'

    end

    else

    begin -- 'bacci', 'bertucci', other italian

    set @primary = @primary + 'X'

    set @secondary = @secondary + 'X'

    -- e.g. 'focaccia' if substring(@original, @current, 4) = 'CCIA'

    end

    set @current = @current + 3

    end

    else

    begin

    set @primary = @primary + 'K' -- Pierce's rule

    set @secondary = @secondary + 'K'

    set @current = @current + 2

    end

    end

    else

    begin

    if @strnext1 in ('K','G','Q')

    begin

    set @primary = @primary + 'K'

    set @secondary = @secondary + 'K'

    set @current = @current + 2

    end

    else

    begin

    if @strnext1 in ('I','E','Y')

    begin

    if substring(@original, @current, 3) in ('CIO','CIE','CIA') -- italian vs. english

    begin

    set @primary = @primary + 'S'

    set @secondary = @secondary + 'X'

    end

    else

    begin

    set @primary = @primary + 'S'

    set @secondary = @secondary + 'S'

    end

    set @current = @current + 2

    end

    else

    begin

    if @strnext1 = 'Z' -- e.g. 'czerny'

    and substring(@original, @current -2, 4) <> 'WICZ'

    begin

    set @primary = @primary + 'S'

    set @secondary = @secondary + 'X'

    set @current = @current + 2

    end

    else

    begin

    if @current > 2 -- various gremanic

    and substring(@original, @current - 2,1) not in ('A', 'E', 'I', 'O', 'U', 'Y')

    and substring(@original, @current - 1, 3) = 'ACH'

    and ((substring(@original, @current + 2, 1) <> 'I')

    and ((substring(@original, @current + 2, 1) <> 'E')

    or substring(@original, @current - 2, 6) in ('BACHER', 'MACHER')

    )

    )

    begin

    set @primary = @primary + 'K'

    set @secondary = @secondary + 'K'

    set @current = @current + 2

    end

    else

    begin

    if @current = 1 -- special case 'caesar'

    and substring(@original, @current, 6) = 'CAESAR'

    begin

    set @primary = @primary + 'S'

    set @secondary = @secondary + 'S'

    set @current = @current + 2

    end

    else

    begin -- final else

    set @primary = @primary + 'K'

    set @secondary = @secondary + 'K'

    if substring(@original, @current + 1, 2) in (' C',' Q',' G') -- name sent in 'mac caffrey', 'mac gregor'

    set @current = @current + 3

    else

    set @current = @current + 1

    end

    end

    end

    end

    end

    end

    end

    end

    else

    if @strcur1 = 'D'

    begin

    if @strnext1 = 'G'

    begin

    if substring(@original, @current + 2, 1) in ('I','E','Y')

    begin

    set @primary = @primary + 'J' -- e.g. 'edge'

    set @secondary = @secondary + 'J'

    set @current = @current + 3

    end

    else

    begin

    set @primary = @primary + 'TK' -- e.g. 'edgar'

    set @secondary = @secondary + 'TK'

    set @current = @current + 2

    end

    end

    else

    begin

    if substring(@original, @current, 2) in ('DT','DD')

    begin

    set @primary = @primary + 'T'

    set @secondary = @secondary + 'T'

    set @current = @current + 2

    end

    else

    begin

    set @primary = @primary + 'T'

    set @secondary = @secondary + 'T'

    set @current = @current + 1

    end

    end

    end

    else

    if @strcur1 = 'F'

    begin

    set @primary = @primary + 'F'

    set @secondary = @secondary + 'F'

    if (@strnext1 = 'F')

    set @current = @current + 2

    else

    set @current = @current + 1

    end

    else

    if @strcur1 = 'G'

    begin

    if (@strnext1 = 'H')

    begin

    if @current > 1

    and @strprev1 not in ('A', 'E', 'I', 'O', 'U', 'Y')

    begin

    set @primary = @primary + 'K'

    set @secondary = @secondary + 'K'

    end

    else

    begin

    if not( (@current > 2 -- Parker's rule (with some further refinements) - e.g. 'hugh'

    and substring(@original, @current - 2, 1) in ('B','H','D')

    ) -- e.g. 'bough'

    or (@current > 3

    and substring(@original, @current - 3, 1) in ('B','H','D')

    ) -- e.g. 'broughton'

    or (@current > 4

    and substring(@original, @current - 4, 1) in ('B','H')

    ) )

    begin

    if @current > 3 -- e.g. 'laugh', 'McLaughlin', 'cough', 'gough', 'rough', 'tough'

    and @strprev1 = 'U'

    and substring(@original, @current - 3, 1) in ('C','G','L','R','T')

    begin

    set @primary = @primary + 'F'

    set @secondary = @secondary + 'F'

    end

    else

    begin

    if @current > 1

    and @strprev1 <> 'I'

    begin

    set @primary = @primary + 'K'

    set @secondary = @secondary + 'K'

    end

    else

    begin

    if (@current < 4)

    begin

    if (@current = 1) -- 'ghislane', 'ghiradelli'

    begin

    if (substring(@original, @current + 2, 1) = 'I')

    begin

    set @primary = @primary + 'J'

    set @secondary = @secondary + 'J'

    end

    else

    begin

    set @primary = @primary + 'K'

    set @secondary = @secondary + 'K'

    end

    end

    end

    end

    end

    end

    end

    set @current = @current + 2

    end

    else

    begin

    if (@strnext1 = 'N')

    begin

    if @current = 1

    and substring(@original, 0,1) in ('A', 'E', 'I', 'O', 'U', 'Y')

    and @SlavoGermanic = 0

    begin

    set @primary = @primary + 'KN'

    set @secondary = @secondary + 'N'

    end

    else

    begin

    -- not e.g. 'cagney'

    if substring(@original, @current + 2, 2) = 'EY'

    and (@strnext1 <> 'Y')

    and @SlavoGermanic = 0

    begin

    set @primary = @primary + 'N'

    set @secondary = @secondary + 'KN'

    end

    else

    begin

    set @primary = @primary + 'KN'

    set @secondary = @secondary + 'KN'

    end

    end

    set @current = @current + 2

    end

    else

    begin

    if substring(@original, @current + 1, 2) = 'LI' -- 'tagliaro'

    and @SlavoGermanic = 0

    begin

    set @primary = @primary + 'KL'

    set @secondary = @secondary + 'L'

    set @current = @current + 2

    end

    else

    begin

    if @current = 1 -- -ges-, -gep-, -gel- at beginning

    and (@strnext1 = 'Y'

    or substring(@original, @current + 1, 2) in ('ES','EP','EB','EL','EY','IB','IL','IN','IE', 'EI','ER')

    )

    begin

    set @primary = @primary + 'K'

    set @secondary = @secondary + 'J'

    set @current = @current + 2

    end

    else

    begin

    if (substring(@original, @current + 1, 2) = 'ER' -- -ger-, -gy-

    or @strnext1 = 'Y'

    )

    and substring(@original, 1, 6) not in ('DANGER','RANGER','MANGER')

    and @strprev1 not in ('E', 'I')

    and substring(@original, @current - 1, 3) not in ('RGY','OGY')

    begin

    set @primary = @primary + 'K'

    set @secondary = @secondary + 'J'

    set @current = @current + 2

    end

    else

    begin

    if @strnext1 in ('E','I','Y') -- italian e.g. 'biaggi'

    or substring(@original, @current -1, 4) in ('AGGI','OGGI')

    begin

    if (substring(@original, 1, 4) in ('VAN ', 'VON ') -- obvious germanic

    or substring(@original, 1, 3) = 'SCH'

    )

    or substring(@original, @current + 1, 2) = 'ET'

    begin

    set @primary = @primary + 'K'

    set @secondary = @secondary + 'K'

    end

    else

    begin

    -- always soft if french ending

    if substring(@original, @current + 1, 4) = 'IER '

    begin

    set @primary = @primary + 'J'

    set @secondary = @secondary + 'J'

    end

    else

    begin

    set @primary = @primary + 'J'

    set @secondary = @secondary + 'K'

    end

    end

    set @current = @current + 2

    end

    else

    begin -- other options exausted call it k sound

    set @primary = @primary + 'K'

    set @secondary = @secondary + 'K'

    if (@strnext1 = 'G')

    set @current = @current + 2

    else

    set @current = @current + 1

    end

    end

    end

    end

    end

    end

    end

    else

    if @strcur1 = 'H'

    begin

    if (@current = 1 -- only keep if first & before vowel or btw. 2 vowels

    or @strprev1 in ('A', 'E', 'I', 'O', 'U', 'Y')

    )

    and @strnext1 in ('A', 'E', 'I', 'O', 'U', 'Y')

    begin

    set @primary = @primary + 'H'

    set @secondary = @secondary + 'H'

    set @current = @current + 2

    end

    else

    set @current = @current + 1

    end

    else

    if @strcur1 = 'J'

    begin

    if substring(@original, @current, 4) = 'JOSE' -- obvious spanish, 'jose', 'san jacinto'

    or substring(@original, 1, 4) = 'SAN '

    begin

    if (@current = 1

    and substring(@original, @current + 4, 1) = ' '

    )

    or substring(@original, 1, 4) = 'SAN '

    begin

    set @primary = @primary + 'H'

    set @secondary = @secondary + 'H'

    end

    else

    begin

    set @primary = @primary + 'J'

    set @secondary = @secondary + 'H'

    end

    set @current = @current + 1

    end

    else

    begin

    if @current = 1

    begin

    set @primary = @primary + 'J' -- Yankelovich/Jankelowicz

    set @secondary = @secondary + 'A'

    set @current = @current + 1

    end

    else

    begin

    if @strprev1 in ('A', 'E', 'I', 'O', 'U', 'Y') -- spanish pron. of .e.g. 'bajador'

    and @SlavoGermanic = 0

    and @strnext1 in ('A','O')

    begin

    set @primary = @primary + 'J'

    set @secondary = @secondary + 'H'

    set @current = @current + 1

    end

    else

    begin

    if (@current = @Last)

    begin

    set @primary = @primary + 'J'

    set @secondary = @secondary + ''

    set @current = @current + 1

    end

    else

    begin

    if @strnext1 in ('L','T','K','S','N','M','B','Z')

    and @strprev1 not in ('S','K','L')

    begin

    set @primary = @primary + 'J'

    set @secondary = @secondary + 'J'

    set @current = @current + 1

    end

    else

    begin

    if (@strnext1 = 'J') -- it could happen

    set @current = @current + 2

    else

    set @current = @current + 1

    end

    end

    end

    end

    end

    end

    else

    if @strcur1 = 'K'

    begin

    set @primary = @primary + 'K'

    set @secondary = @secondary + 'K'

    if (@strnext1 = 'K')

    set @current = @current + 2

    else

    set @current = @current + 1

    end

    else

    if @strcur1 = 'L'

    begin

    if (@strnext1 = 'L')

    begin

    if (@current = (@length - 3) -- spanish e.g. 'cabrillo', 'gallegos'

    and substring(@original, @current - 1, 4) in ('ILLO','ILLA','ALLE')

    )

    or ((substring(@original, @Last - 1, 2) in ('AS','OS')

    or substring(@original, @Last, 1) in ('A','O')

    )

    and substring(@original, @current - 1, 4) = 'ALLE'

    )

    set @primary = @primary + 'L' --Alternate is silent

    else

    begin

    set @primary = @primary + 'L'

    set @secondary = @secondary + 'L'

    end

    set @current = @current + 2

    end

    else

    begin

    set @current = @current + 1

    set @primary = @primary + 'L'

    set @secondary = @secondary + 'L'

    end

    end

    else

    if @strcur1 = 'M'

    begin

    set @primary = @primary + 'M'

    set @secondary = @secondary + 'M'

    if substring(@original, @current - 1, 3) = 'UMB'

    and (@current + 1 = @Last

    or substring(@original, @current + 2, 2) = 'ER'

    ) -- 'dumb', 'thumb'

    or @strnext1 = 'M'

    set @current = @current + 2

    else

    set @current = @current + 1

    end

    else

    if @strcur1 in ('N','Ñ')

    begin

    set @primary = @primary + 'N'

    set @secondary = @secondary + 'N'

    if @strnext1 in ('N','Ñ')

    set @current = @current + 2

    else

    set @current = @current + 1

    end

    else

    if @strcur1 = 'P'

    begin

    if (@strnext1 = 'H')

    begin

    set @current = @current + 2

    set @primary = @primary + 'F'

    set @secondary = @secondary + 'F'

    end

    else

    begin

    -- also account for 'campbell' and 'raspberry'

    if @strnext1 in ('P','B')

    set @current = @current + 2

    else

    begin

    set @current = @current + 1

    set @primary = @primary + 'P'

    set @secondary = @secondary + 'P'

    end

    end

    end

    else

    if @strcur1 = 'Q'

    begin

    set @primary = @primary + 'K'

    set @secondary = @secondary + 'K'

    if (@strnext1 = 'Q')

    set @current = @current + 2

    else

    set @current = @current + 1

    end

    else

    if @strcur1 = 'R'

    begin

    if @current = @Last -- french e.g. 'rogier', but exclude 'hochmeier'

    and @SlavoGermanic = 0

    and substring(@original, @current - 2, 2) = 'IE'

    and substring(@original, @current - 4, 2) not in ('ME','MA')

    set @secondary = @secondary + 'R' --set @primary = @primary + ''

    else

    begin

    set @primary = @primary + 'R'

    set @secondary = @secondary + 'R'

    end

    if (@strnext1 = 'R')

    begin

    if substring(@original, @current, 3) = 'RRI' --alternate Kerrigan, Corrigan

    set @secondary = @secondary + 'R'

    set @current = @current + 2

    end

    else

    set @current = @current + 1

    end

    else

    if @strcur1 = 'S'

    begin

    if substring(@original, @current - 1, 3) in ('ISL','YSL') -- special cases 'island', 'isle', 'carlisle', 'carlysle'

    set @current = @current + 1 --silent s

    else

    begin

    if substring(@original, @current, 2) = 'SH'

    begin

    -- germanic

    if substring(@original, @current + 1, 4) in ('HEIM','HOEK','HOLM','HOLZ')

    begin

    set @primary = @primary + 'S'

    set @secondary = @secondary + 'S'

    end

    else

    begin

    set @primary = @primary + 'X'

    set @secondary = @secondary + 'X'

    end

    set @current = @current + 2

    end

    else

    begin

    -- italian & armenian

    if substring(@original, @current, 3) in ('SIO','SIA')

    or substring(@original, @current, 4) in ('SIAN')

    begin

    if @SlavoGermanic = 0

    begin

    set @primary = @primary + 'S'

    set @secondary = @secondary + 'X'

    end

    else

    begin

    set @primary = @primary + 'S'

    set @secondary = @secondary + 'S'

    end

    set @current = @current + 3

    end

    else

    begin

    if (@current = 1 -- german & anglicisations, e.g. 'smith' match 'schmidt', 'snider' match 'schneider'

    and @strnext1 in ('M','N','L','W') -- also, -sz- in slavic language altho in hungarian it is pronounced 's'

    )

    or @strnext1 = 'Z'

    begin

    set @primary = @primary + 'S'

    set @secondary = @secondary + 'X'

    if @strnext1 = 'Z'

    set @current = @current + 2

    else

    set @current = @current + 1

    end

    else

    begin

    if substring(@original, @current, 2) = 'SC'

    begin

    if substring(@original, @current + 2, 1) = 'H' -- Schlesinger's rule

    begin

    if substring(@original, @current + 3, 2) in ('OO','ER','EN','UY','ED','EM') -- dutch origin, e.g. 'school', 'schooner'

    begin

    if substring(@original, @current + 3, 2) in ('ER','EN') -- 'schermerhorn', 'schenker'

    begin

    set @primary = @primary + 'X'

    set @secondary = @secondary + 'SK'

    end

    else

    begin

    set @primary = @primary + 'SK'

    set @secondary = @secondary + 'SK'

    end

    set @current = @current + 3

    end

    else

    begin

    if @current = 1

    and substring(@original, 3,1) not in ('A', 'E', 'I', 'O', 'U', 'Y')

    and substring(@original, @current + 3, 1) <> 'W'

    begin

    set @primary = @primary + 'X'

    set @secondary = @secondary + 'S'

    end

    else

    begin

    set @primary = @primary + 'X'

    set @secondary = @secondary + 'X'

    end

    set @current = @current + 3

    end

    end

    else

    begin

    if substring(@original, @current + 2, 1) in ('I','E','Y')

    begin

    set @primary = @primary + 'S'

    set @secondary = @secondary + 'S'

    end

    else

    begin

    set @primary = @primary + 'SK'

    set @secondary = @secondary + 'SK'

    end

    set @current = @current + 3

    end

    end

    else

    begin

    if @current = 1 -- special case 'sugar-'

    and substring(@original, @current, 5) = 'SUGAR'

    begin

    set @primary = @primary + 'X'

    set @secondary = @secondary + 'S'

    set @current = @current + 1

    end

    else

    begin

    if @current = @Last -- french e.g. 'resnais', 'artois'

    and substring(@original, @current - 2, 2) in ('AI','OI')

    set @secondary = @secondary + 'S' --set @primary = @primary + ''

    else

    begin

    set @primary = @primary + 'S'

    set @secondary = @secondary + 'S'

    end

    if @strnext1 in ('S','Z')

    set @current = @current + 2

    else

    set @current = @current + 1

    end

    end

    end

    end

    end

    end

    end

    else

    if @strcur1 = 'T'

    begin

    if substring(@original, @current, 4) = 'TION'

    begin

    set @primary = @primary + 'X'

    set @secondary = @secondary + 'X'

    set @current = @current + 3

    end

    else

    if substring(@original, @current, 3) in ('TIA','TCH')

    begin

    set @primary = @primary + 'X'

    set @secondary = @secondary + 'X'

    set @current = @current + 3

    end

    else

    if substring(@original, @current, 2) = 'TH'

    or substring(@original, @current, 3) = 'TTH'

    begin

    if substring(@original, @current + 2, 2) in ('OM','AM') -- special case 'thomas', 'thames' or germanic

    or substring(@original, 0, 4) in ('VAN ','VON ')

    or substring(@original, 0, 3) = 'SCH'

    begin

    set @primary = @primary + 'T'

    set @secondary = @secondary + 'T'

    end

    else

    begin

    set @primary = @primary + '0'

    set @secondary = @secondary + 'T'

    end

    set @current = @current + 2

    end

    else

    begin

    if @strnext1 in ('T','D')

    begin

    set @current = @current + 2

    set @primary = @primary + 'T'

    set @secondary = @secondary + 'T'

    end

    else

    begin

    set @current = @current + 1

    set @primary = @primary + 'T'

    set @secondary = @secondary + 'T'

    end

    end

    end

    else

    if @strcur1 = 'V'

    if (@strnext1 = 'V')

    set @current = @current + 2

    else

    begin

    set @current = @current + 1

    set @primary = @primary + 'F'

    set @secondary = @secondary + 'F'

    end

    else

    if @strcur1 = 'W'

    begin

    -- can also be in middle of word

    if substring(@original, @current, 2) = 'WR'

    begin

    set @primary = @primary + 'R'

    set @secondary = @secondary + 'R'

    set @current = @current + 2

    end

    else

    if @current = 1

    and (@strnext1 in ('A', 'E', 'I', 'O', 'U', 'Y')

    or substring(@original, @current, 2) = 'WH'

    )

    begin

    if @strnext1 in ('A', 'E', 'I', 'O', 'U', 'Y') -- Wasserman should match Vasserman

    begin

    set @primary = @primary + 'A'

    set @secondary = @secondary + 'F'

    set @current = @current + 1

    end

    else

    begin

    set @primary = @primary + 'A' -- need Uomo to match Womo

    set @secondary = @secondary + 'A'

    set @current = @current + 1

    end

    end

    else

    if (@current = @Last -- Arnow should match Arnoff

    and @strprev1 in ('A', 'E', 'I', 'O', 'U', 'Y')

    )

    or substring(@original, @current - 1, 5) in ('EWSKI','EWSKY','OWSKI','OWSKY')

    or substring(@original, 0, 3) = 'SCH'

    begin

    set @secondary = @secondary + 'F' --set @primary = @primary + ''

    set @current = @current + 1

    end

    else

    if substring(@original, @current, 4) in ('WICZ','WITZ') -- polish e.g. 'filipowicz'

    begin

    set @primary = @primary + 'TS'

    set @secondary = @secondary + 'FX'

    set @current = @current + 4

    end

    else

    set @current = @current + 1 -- else skip it

    end

    else

    if @strcur1 = 'X'

    begin

    if not (@current = @Last -- french e.g. breaux

    and (substring(@original, @current - 3, 3) in ('IAU', 'EAU')

    or substring(@original, @current - 2, 2) in ('AU', 'OU')

    )

    )

    begin

    set @primary = @primary + 'KS'

    set @secondary = @secondary + 'KS'

    end --else skip it

    if @strnext1 in ('C','X')

    set @current = @current + 2

    else

    set @current = @current + 1

    end

    else

    if @strcur1 = 'Z'

    begin

    if (@strnext1 = 'Z')

    begin

    set @primary = @primary + 'S'

    set @secondary = @secondary + 'S'

    set @current = @current + 2

    end

    else

    begin

    if (@strnext1 = 'H') -- chinese pinyin e.g. 'zhao'

    begin

    set @primary = @primary + 'J'

    set @secondary = @secondary + 'J'

    set @current = @current + 2

    end

    else

    begin

    if (substring(@original, @current + 1, 2) in ('ZO', 'ZI', 'ZA'))

    or (@SlavoGermanic = 1

    and (@current > 1

    and @strprev1 <> 'T'

    )

    )

    begin

    set @primary = @primary + 'S'

    set @secondary = @secondary + 'TS'

    end

    else

    begin

    set @primary = @primary + 'S'

    set @secondary = @secondary + 'S'

    end

    end

    set @current = @current + 1

    end

    end

    else

    set @current = @current + 1

    end

    return cast(@primary as char(5)) + cast(@secondary as char(5))

    end

    Then I added a Matchscore algorithm to give me an adjustable match (this bit is still being developed).

    CREATE FUNCTION dbo.MetaphoneMatch(@str1 varchar(70),@str2 varchar(70))

    returns tinyint

    as

    begin

    declare @Res tinyint

    set @Res = 0

    if @str1 = @str2 set @Res = 10

    else

    begin

    declare @tmpStr char(10),

    @p1 varchar(5),

    @s1 varchar(5),

    @p2 varchar(5),

    @s2 varchar(5)

    set @tmpStr = dbo.DoubleMetaPhone(@str1)

    set @p1 = rtrim(left(@tmpStr,5))

    set @s1 = ltrim(right(@tmpStr,5))

    set @tmpStr = dbo.DoubleMetaPhone(@str2)

    set @p2 = rtrim(left(@tmpStr,5))

    set @s2 = ltrim(right(@tmpStr,5))

    if @p1 = '' set @p1 = null

    if @p2 = '' set @p2 = null

    if @s1 = '' set @s1 = null

    if @s2 = '' set @s2 = null

    if (@p1 = @s1) and (@p2 = @s2) and (@p1 = @p2) set @Res = 9 --consistent match between consistent keys

    else

    if ((@p1 = @s1) and (@p2 <> @s2)) or

    ((@p1 <> @s1) and (@p2 = @s2)) -- one Umatched key

    if (@p1 = @p2) set @Res = 8 --matched primary key

    else

    if (@p1 = @s2) or

    (@p2 = @s1) set @Res = 7 --matched Secondary to primary key

    else

    if (@p1 <> @p2) and

    (@s1 = @s2) set @Res = 6 --matched Secondary key with unmatched primary

    else

    set @Res = 0

    else

    if ((@p1 <> @s1) and (@p2 <> @s2)) -- both Umatched keys

    if (@p1 = @p2) and

    (@s1 = @s2) set @Res = 5 --matched primary and matched Secondary keys

    else

    if (@p1 = @p2) and

    (@s1 <> @s2) set @Res = 4 --matched primary key with unmatched Secondary

    else

    if (@p1 = @s2) and

    (@s1 = @p2) set @Res = 3 --cross matched Secondary to primary key

    else

    if (@p1 = @s2) or

    (@p2 = @s1) set @Res = 2 --matched Secondary to primary key

    else

    if (@p1 <> @p2) and

    (@s1 = @s2) set @Res = 1 --matched Secondary key with unmatched primary

    else

    set @Res = 0

    end

    return @Res

    end

    This works better than soundex for surnames, but I need something else for firstnames. I have a dataset where a lot of firstnames are missing or initals.

    I need a keyname list to match unphonetic names (like Robert and Bob, Anthony and Tony etc) does anyone know of a good one?

  • Sorry about the delay responding. Thanks a ton for the code. It will take me awhile to absorb what you sent. In the meantime, my soundex procedure is now running in about 20 seconds to compare 10,000 names with 400,000!

    Sure has come along way from the first version with the cursor.

    CREATE PROCEDURE procAbcs_hit_New @abcs_cd varchar (20)

    --Created on 7/15/02 by Jonathan Matt. Input pararmeter @abcd_cd is the code supplied by calling app abcs_hit.mdb

    AS

    SET NOCOUNT ON

    DELETE WORKING_abcs

    INSERT WORKING_abcs (AbcsId, prspct_id)

    SELECT AbcsMstr.AbcsID, Prspct.prspct_id

    FROM AbcsMstr JOIN (Person JOIN Prspct ON Person.prspct_id = Prspct.prspct_id)

    ON (AbcsMstr.Zip = Prspct.hsehold_postal_zone_cd)

    WHERE AbcsMstr.Coup=@abcs_cd AND

    Prspct.prspct_type_cd ='Household' AND

    SUBSTRING(AbcsMstr.Zip, 1, 5)=SUBSTRING(Prspct.hsehold_postal_zone_cd, 1, 5) AND

    SOUNDEX (AbcsMstr.Last_Name)=SOUNDEX(person.person_last_name) AND

    SOUNDEX(AbcsMstr.First_Name)=SOUNDEX(person.person_first_name);

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply