Advice on Bitwise Comparison

  • Hi,

    I have a task to complete that involves running a series of tests (mostly text comparisons) against a table, then summarising the results at two grouping levels. The set of tests to be run is dynamic. I want to be able to easily select rows where a combination of tests is failed. Because of this, I'm using a bitwise comparison.

    Very simplified setup is like this:

    -- create test tables

    create table #base_data (

    PrimaryId int

    , FirstGroupingId int

    , SecondGroupingId int

    , TextData varchar(100)

    , BitwiseValue int

    )

    create table #check_data (

    CheckId int

    , CheckValue varchar(10)

    , CheckBitwiseValue int

    , FailLevel int

    )

    create table #group_2_summary (

    FirstGroupingId int

    , SecondGroupingId int

    , CheckId int

    , FailCount int

    )

    create table #group_1_summary (

    FirstGroupingId int

    , IsFailure bit

    )

    -- add test data

    insert #base_data

    select 1, 1, 1, 'Bob', 0 union all

    select 2, 1, 2, 'Dave', 0 union all

    select 3, 1, 2, 'Alan', 0 union all

    select 4, 2, 1, 'Mary', 0 union all

    select 5, 2, 1, 'Sarah', 0 union all

    select 6, 2, 1, 'Charlie', 0 union all

    select 7, 2, 2, 'Caroline', 0

    insert #check_data

    select 1, 'A', 1, 3 union all

    select 2, 'B', 2, 3 union all

    select 3, 'C', 4, 3 union all

    select 4, 'D', 8, 3

    Initially, I just populate the bitwise column in the base data using a case statement:

    -- populate the individual bitwise values

    -- edit: note that in real life this query is built dynamically and the case statements can be looking at any number of different fields within #base_data

    -- but this is omitted for simplicity

    ; with cte as (

    select PrimaryId

    , BitwiseValue

    , case when charindex('A',TextData) > 0 then 1 else 0 end as Check1

    , case when charindex('B',TextData) > 0 then 2 else 0 end as Check2

    , case when charindex('C',TextData) > 0 then 4 else 0 end as Check3

    , case when charindex('D',TextData) > 0 then 8 else 0 end as Check4

    from #base_data

    )

    update cte

    set BitwiseValue = Check1 + Check2 + Check3 + Check4

    select * from #base_data

    go

    So far so good. Now I need to create a summary of fail counts for each test at SecondGroupingId level. I'm struggling to come up with a set based solution for this, so have deployed a cursor. I would really like to avoid the use of the cursor if at all possible:

    -- populate the second grouped summary first

    -- this needs to have an individual row for each combination of

    -- FirstGroupingId, SecondGroupingId and CheckId

    -- hence use of cursor/loop

    declare @fetch int

    , @CheckId int

    , @CheckBitwiseValue int

    declare curChecks cursor for

    select CheckId

    , CheckBitwiseValue

    from #check_data

    open curChecks

    fetch next from curChecks

    into @CheckId, @CheckBitwiseValue

    set @fetch = @@fetch_status

    while @fetch = 0

    begin

    -- use cte to get round outer reference problem

    ; with cte as (

    select FirstGroupingId

    , SecondGroupingId

    , count(*) as FailCount

    from #base_data

    where @CheckBitwiseValue & BitwiseValue = @CheckBitwiseValue

    group by FirstGroupingId

    , SecondGroupingId

    )

    insert #group_2_summary

    select FirstGroupingId

    , SecondGroupingId

    , @CheckId

    , FailCount

    from cte

    fetch next from curChecks

    into @CheckId, @CheckBitwiseValue

    set @fetch = @@fetch_status

    end

    close curChecks

    deallocate curChecks

    go

    There are no crieria for failure at SecondGroupingId level individually. But, if the sum of failures for any check at FirstGroupingId level goes over the FailLevel for the check as defined in #check_data, then the FirstGroupingId set of data is deemed to have failed. This can be calculated relatively simply from the #group_2_summary table:

    -- use the second group summary to populate the first group summary

    ; with cte as (

    select FirstGroupingId

    , cd.CheckId

    , sum(FailCount) as TotalFailCount

    , FailLevel

    from #group_2_summary g2s

    join #check_data cd

    on g2s.CheckId = cd.CheckId

    group by FirstGroupingId

    , cd.CheckId

    , FailLevel

    )

    , cte2 as (

    select FirstGroupingId

    , CheckId

    , case when TotalFailCount >= FailLevel then 1 else 0 end as IsFailure

    from cte

    insert #group_1_summary

    select FirstGroupingId, max(IsFailure)

    from cte2

    group by FirstGroupingId

    So for this test data, I expect the results to be as follows:

    select * from #group_1_summary

    /*

    FirstGroupingId IsFailure

    --------------- ---------

    1 0

    2 1

    */

    select * from #group_2_summary

    order by FirstGroupingId, SecondGroupingId

    /*

    FirstGroupingId SecondGroupingId CheckId FailCount

    --------------- ---------------- ----------- -----------

    1 1 2 1

    1 2 1 2

    1 2 4 1

    2 1 1 3

    2 1 3 1

    2 2 3 1

    2 2 1 1

    */

    Cleanup:

    drop table #base_data

    drop table #check_data

    drop table #group_1_summary

    drop table #group_2_summary

    It all just feels a bit lumpy and inelegant. Perhaps someone can suggest a better way?

    Thanks, Iain

    Edit: clarification

  • Incidentally, I'm eliminating any rows where BitwiseValue = 0.

    I've also thought that I might be able to use a combination of case statements in the cte with an unpivot in the insert to do the SecondGroupingId summary. Maybe this is a better way?

    I've been looking at this for far too long, my head hurts... :unsure:

  • I'm getting lost in your requirements, but you definitely don't need the cursor.

    ; with basecte as (

    select FirstGroupingId,CheckID,faillevel

    , sum(case when charindex(CheckValue,TextData) > 0 then 1 end) status

    from #base_data cross join #check_data

    group by FirstGroupingId,CheckID,faillevel

    )

    select FirstGroupingId, max(case when status>=faillevel then 1 else 0 end) from baseCTE group by FirstGroupingId

    ; with basecte as (

    select FirstGroupingId,SecondGroupingId,checkID

    ,sum( case when charindex(CheckValue,TextData) > 0 then 1 end) status

    from #base_data cross join #check_data group by

    FirstGroupingId,SecondGroupingId,checkID

    )

    select * from basecte where status is not null

    Note: I am bypassing the bitwise operation, because it's not helping you. You've already got what you need to check a variable number of tests.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Hi Matt,

    Thanks for the reply. My example is heavily simplified. The procedure is to be used with an application that allows a user to select and execute a set of tests of their choice against a sample of data. These checks can be run against any one of a number of different columns. The tests are not limited to simple string comparisons. Perhaps I should have made this clearer in my original post, I was trying simplify what is a bit of a complex problem!

    I have used the bitwise field because I have a requirement to allow a user to select rows from the master data (i.e. #base_data) that have failed any combination of tests, e.g. return all rows that failed tests 2,3 & 7. So it's either add lots of bit flags or a bitwise column and I prefer the bitwise approach.

    The bit I really want to get rid of is the loop in the middle section. I'm going to have a look at the case/unpivot approach and see if I can make something work.

    I'm open to suggestions as to other approaches.

    Thanks Iain

  • As I thought. The case with unpivot option does the job much better:

    -- unpivot

    -- again, note that this will be built dynamically, but not shown here for clarity

    select FirstGroupingId

    , SecondGroupingId

    , CheckId

    , FailCount

    from (

    select FirstGroupingId

    , SecondGroupingId

    , sum(case when 1 & BitwiseValue = 1 then 1 else 0 end) as [1]

    , sum(case when 2 & BitwiseValue = 2 then 1 else 0 end) as [2]

    , sum(case when 4 & BitwiseValue = 4 then 1 else 0 end) as [4]

    , sum(case when 8 & BitwiseValue = 8 then 1 else 0 end) as [8]

    from #base_data

    group by FirstGroupingId

    , SecondGroupingId

    ) src

    unpivot (

    FailCount for CheckId in (

    [1],[2],[4],[8]

    )

    ) unpvt

    where FailCount > 0

    order by FirstGroupingId, SecondGroupingId, CheckId

    A clear demonstration of how it's often much better to step away from something than to attempt to plough through it. Well, that and the power of thinking about your problem in such a way as to be able to explain it to someone else.

    I'm still open to suggestions as to approach though. If anyone would like to comment or improve on this feel free.

    Thanks, Iain

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply