Text Field Search

  • Does anyone know how to return the number of occurances of a string from a TEXT field. Thanks in advance.

  • An easy way to do this with a varchar(8000) is to do something like:

    declare @instring varchar(8000)

    set @instring = 'abcabcabc'

    declare @teststring varchar(100)

    set @teststring = 'abc'

    SELECT (LEN(@instring) - LEN(REPLACE(@instring, @teststring, ''))) / LEN(@teststring)

    Unfortunately, REPLACE won't work on a TEXT column... If your data is less than 8001 characters, you can use SUBSTRING to get around that:

    create table #a(blah text)

    insert #a values ('abcabcabc')

    declare @teststring varchar(100)

    set @teststring = 'abc'

    SELECT (LEN(SUBSTRING(blah, 1, 8000)) - DATALENGTH(REPLACE(SUBSTRING(blah, 1, 8000), @teststring, ''))) / LEN(@teststring)

    FROM #a

    If your data is greater than 8000 characters but you know an approximate max, one solution might be to just add these together, changing the range on the SUBSTRING:

    SELECT ((LEN(SUBSTRING(blah, 1, 8000)) - DATALENGTH(REPLACE(SUBSTRING(blah, 1, 8000), @teststring, ''))) / LEN(@teststring))

    + ((LEN(SUBSTRING(blah, 8001, 8000)) - DATALENGTH(REPLACE(SUBSTRING(blah, 8001, 8000), @teststring, ''))) / LEN(@teststring))

    + ((LEN(SUBSTRING(blah, 16001, 8000)) - DATALENGTH(REPLACE(SUBSTRING(blah, 16001, 8000), @teststring, ''))) / LEN(@teststring))

    ...

    FROM #a

    I don't know if that's the most efficient way to handle this, but it is at least somewhat workable

    --
    Adam Machanic
    whoisactive

  • If you are using SQL 2000 fortunately both charindex and substring can work with a text datatype value and text can be used as a data for inputs in a function. So you can do this and get what you want.

    /*-////////////////////////////////////////////////////////////////////////////////////////////////////////

     * Created By: James Travis

     * Created On: 11/19/2004

     * Inputs: 2 (text, varchar(8000)) Output: Integer

     * Purpose: Returns the number of occurances of a user defined string in the text data

     * based on user input in @varFind

    ///////////////////////////////////////////////////////////////////////////////////////////////////////-*/

    CREATE FUNCTION dbo.udf_cntintxt (@varText text, @varFind varchar(8000))

    RETURNS int

    AS

    BEGIN

     declare @cnt int,

      @pos int,

      @pre int

     set @cnt = 0

     set @pos = 0

     set @pre = 0

     while @pos < datalength(@varText)

     begin

      set @pre = IsNull(NullIf(@pos,0),1)

      set @pos = charindex(@varFind, Substring(@varText, @pos, (datalength(@varText) - @pos) + 1))

      if @pos > 0

      begin

       set @cnt = @cnt + 1

       set @pos = (@pos - 1) + @pre + len(@varFind)

      end

      else

       set @pos = datalength(@varText)

     end

     return(@cnt)

    END

  • CHARINDEX will not search beyond 8000 characters in SQL Server 2000... So that function doesn't quite work as expected:

    create table testsearch(textcol text)

    go

    declare @bigstring varchar(8000)

    set @bigstring = replicate('0', 8000)

    declare @bigstring2 varchar(8000)

    set @bigstring2 = replicate('1', 8000)

    exec ('insert testsearch values(''' + @bigstring + @bigstring2 + ''')')

    go

    select dbo.udf_cntintxt(textcol, '1'), substring(textcol, 8000, 2)

    from testsearch

    go

    drop table testsearch

    go

    --
    Adam Machanic
    whoisactive

  • I thought about this some more and realized that the solution I posted won't work if the search string starts, e.g. at character 7999 and ends at 8001... I believe the following solves that problem. This relies upon a sequence table, Numbers, that has a column Number populated with every number between 1 and the length of the longest string you'd like to search:

    create table testsearch(textcol text)

    go

    declare @bigstring varchar(8000)

    set @bigstring = replicate('0', 8000)

    declare @bigstring2 varchar(8000)

    set @bigstring2 = replicate('1', 8000)

    exec ('insert testsearch values(''' + @bigstring + @bigstring2 + ''')')

    go

    declare @searchstring varchar(20)

    set @searchstring = '1'

    select count(*)

    from testsearch x

    join Numbers nStart on nStart.Number <= datalength(x.textcol)

    where substring(x.textcol, nstart.number, len(@searchstring)) = @searchstring

    go

    drop table testsearch

    go

    However, this has another, more subtle issue: What should be returned when searching for 'bab' in the string 'babab'? Regular expressions engines will generally return 1; this method will return 2.

    --
    Adam Machanic
    whoisactive

  • Thanks Adam. Losts of food for thought here. In the end I decided to split my text field into 3 varchar(8000) fields using substring.

    Like your last post I used a bit of overlap:

     search_blockA = substring(@newXML,0,7000),

     search_blockB = substring(@newXML,6980,7020),

     search_blockC = substring(@newXML,13980,7020)

    Then I searched each search_block individually using simple "search" functions (combination of replace / len / divide ).

    Of course as you say this isn't perfect, if the overlapping puts a word in both blocks, but I can live with this.

    Thanks for your input.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply