Counting Instances in TEXT

  • Hello Community,

    I have been presented with the following SQL challenge

    Find the number of times the words 'bull' and 'bear' occur in the contents. We're counting the number of times the words occur so words like 'bullish' should not be included in our count. Output the word 'bull' and 'bear' along with the corresponding number of occurrences.

    The dataset used for the query is as follows:

    CREATE TABLE google_file_store (
    filename varchar(50),
    contents varchar(350))

    INSERT google_file_store VALUES
    ('draft1.txt','The stock exchange predicts a bull market which would make many investors happy.'),
    ('draft2.txt','The stock exchange predicts a bull market which would make many investors happy, but analysts warn of possibility of too much optimism and that in fact we are awaiting a bear market.'),
    ('final.txt','The stock exchange predicts a bull market which would make many investors happy, but analysts warn of possibility of too much optimism and that in fact we are awaiting a bear market. As always predicting the future market is an uncertain game and all investors should follow their instincts and best practices.')

    The expected result should look as follows:

    casewhen

     

    I really would like to achieve the result using case when statements. Therefore, can someone let me know if it would be possible to tweak my case when statements to achieve the desired result?

    SELECT
    CASE
    WHEN google_file_store.contents LIKE '%bull%' THEN 'bull'
    END AS expr1
    ,CASE
    WHEN google_file_store.contents LIKE '%bear%' THEN 'bear'
    END AS expr2
    FROM dbo.google_file_store

    The above gives me

    casewhenme

     

  • INSERT google_file_store VALUES
    ('alt1.txt','The bullet points of this are:'),
    ('alt2.txt','- sometimes bull trading means selling bulls on the market;')
    ('alt3.txt','- bear in mind that ball bearings might not be relevant here, as well as bullying in schools')
    ('alt4.txt','- still need to count mentions of bear/bull exchange tradings')

    What outcome do you expect from this data?

    _____________
    Code for TallyGenerator

  • Hi, the expected output should like the followingcasewhen

  • The key to your problem is that you want two rows to be generated from a single rows.    My favorite technique for accomplishing this is to use CROSS APPLY in conjunction with a VALUES clause.    Unlike a CROSS JOIN of a table, the VALUES clause lets you specify expressions, such as the CASE expressions used here.   Note that two separate CASE expressions were required, not a single CASE that would return either bull or bear.

    Once multiple rows are returned from the CROSS APPLY,  a simple summary query gives the desired result.

    select Word, count(*) as Nentry
    from #google_file_store
    cross apply
    (values (case when contents like '%bull%' then 'Bull' end)
    ,(case when contents like '%bear%' then 'Bear' end)
    ) v(Word)
    where Word is not null
    group by Word
    order by Nentry desc

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  •  

    SELECT ca.name, ca.nentry
    FROM (
    SELECT
    SUM(CASE WHEN '.' + contents + '.' LIKE '%[^A-Za-z]bull[^A-Za-z]%' THEN 1 ELSE 0 END) AS bull,
    SUM(CASE WHEN '.' + contents + '.' LIKE '%[^A-Za-z]bear[^A-Za-z]%' THEN 1 ELSE 0 END) AS bear
    FROM google_file_store
    WHERE '.' + contents + '.' LIKE '%[^A-Za-z]bull[^A-Za-z]%' OR
    '.' + contents + '.' LIKE '%[^A-Za-z]bear[^A-Za-z]%'
    ) AS combined
    CROSS APPLY ( VALUES('bull', bull), ('bear', bear) ) AS ca(name, nentry)

    SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

  • Perfect.

     

    Thanks

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply