Does full-text search could count words occurrence number

  • Hello,

    I use the full-text search utility in SQL Server 2008 to find word in PDFs document.

    This is my 'Documents' table:

    id (PK), data (VarBinary(max)), extension (nvarchar(4))

    My full-text catalog on 'data' column works fine because when I search 'Microsoft', my document containing this word is returned as result.

    SELECT * FROM Documents WHERE freetext([data], 'Microsoft');

    1 | 0x255044.... | .pdf

    But I need to know how many times 'Microsoft' word appears in this document.

    Do you have any idea how can I retrieve this information?

    The problem is that full-text catalog use Acrobat IFilter to index the PDF; the data stored in VarBinary column is PDF source code. I can't use string function on it; I have to recover this information from the full-text catalog (if it exists) or use another language.

    This is the result when I do 'convert' on data column:

    SELECT convert(varchar(3000),data) FROM Documents WHERE pkId = 1;

    %PDF-1.3 %âãÏÓ 3084 0 obj << /Linearized 1 /O 3089 /H [ 3228 1955 ] /L 3178374 /E 95233 /N 83 /T 3116574 >> endobj xref 3084 126 0000000016 00000 n 0000002876 00000 n 0000003093 00000 n 0000003126 00000 n 0000003185 00000 n 0000005183 00000 n 0000005373 00000 n 0000005474 00000 n 0000005575 00000 n 0000005676 00000 n 0000005774 00000 n 0000005872 00000 n 0000005971 00000 n 0000006070 00000 n 0000006170 00000 n 0000006270 00000 n 0000006370 00000 n 0000006470 00000 n 0000006570 00000 n 0000006670 00000 n 0000006770 00000 n 0000006870 00000 n 0000006970 00000 n 0000007070 00000 n 0000007170 00000 n 0000007270 00000 n 0000007370 00000 n 0000007470 00000 n 0000007570 00000 n 0000007670 00000 n 0000007770 00000 n 0000007870 00000 n 0000007970 00000 n 0000008070 00000 n 0000008170 00000 n 0000008270 00000 n 0000008370 00000 n 0000008470 00000 n 0000008570 00000 n 0000008670 00000 n 0000008770 00000 n 0000008870 00000 n 0000008970 00000 n 0000009070 00000 n 0000009170 00000 n 0000009270 00000 n 0000009370 00000 n 0000009470 00000 n 0000009570 00000 n 0000009670 00000 n 0000009770 00000 n 0000009870 00000 n 0000009970 00000 n 0000010070 00000 n 0000010170 00000 n 0000010270 00000 n 0000010370 00000 n 0000010470 00000 n 0000010570 00000 n 0000010670 00000 n 0000010770 00000 n 0000010870 00000 n 0000010970 00000 n 0000011070 00000 n 0000011170 00000 n 0000011270 00000 n 0000011370 00000 n 0000011470 00000 n 0000011570 00000 n 0000011670 00000 n 0000011770 00000 n 0000011870 00000 n 0000011970 00000 n 0000012070 00000 n 0000012170 00000 n 0000012270 00000 n 0000012370 00000 n 0000012470 00000 n 0000012570 00000 n 0000012670 00000 n 0000012770 00000 n 0000012870 00000 n 0000012970 00000 n 0000013070 00000 n 0000013170 00000 n 0000013270 00000 n 0000013370 00000 n 0000013470 00000 n 0000013570 00000 n 0000013670 00000 n 0000013770 00000 n 0000013870 00000 n 0000013970 00000 n 0000014070 00000 n 0000014170 00000 n 0000014270 00000 n 0000014370 00000 n 0000014470 00000 n 0000014570 00000 n 0000014670 00000 n 0000014770 00000 n 0000014870 00000 n 0000014970 00000 n 0000015070 00000 n 0000015170 00000 n 0000015270 00000 n 0000015370 00000 n 0000015471 00000 n 0000015571 00000 n 0000015823 00000 n 0000015866 00000 n 0000015921 00000 n 0000016008 00000 n 0000016763 00000 n 0000017616 00000 n 0000018236 00000 n 0000019141 00000 n 0000020064 00000 n 0000020205 00000 n 0000025943 00000 n 0000030708 00000 n 0000033387 00000 n 0000034176 00000 n 0000047461 00000 n 0000003228 00000 n 0000005159 00000 n trailer << /Size 3210 /Info 3062 0 R /Root 3085 0 R /Prev 3116562 /ID[<fa39a35cd65b461be8a9c636ce62aa9e><2ed6938fec89c9510f112542eda60d28>] >> startxref 0 %%EOF 3085 0 obj << /Type /Catalog /Pages 3065 0 R /Metadata 3063 0 R /Threads 3086 0 R /Names 3088 0 R /OpenAction [ 3089 0

    Thanks in advance!

  • I'd have to dig in, but I believe that the FTS index is a system table, and you can find out the number of times a word appears in a document from there. I don't have the code or VM handy right now, but I know it's in 2012. Not sure about 2008, but I think so.

    Look for the FTS internal tables.

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply