Advice on Bitwise Comparison

Question

Post reply

Advice on Bitwise Comparison

irobertson

Hall of Fame

Points: 3695
More actions
July 10, 2012 at 2:21 pm

#272303

Hi,

I have a task to complete that involves running a series of tests (mostly text comparisons) against a table, then summarising the results at two grouping levels. The set of tests to be run is dynamic. I want to be able to easily select rows where a combination of tests is failed. Because of this, I'm using a bitwise comparison.

Very simplified setup is like this:

-- create test tables
create table #base_data ( PrimaryId int , FirstGroupingId int , SecondGroupingId int , TextData varchar(100) , BitwiseValue int ) create table #check_data ( CheckId int , CheckValue varchar(10) , CheckBitwiseValue int , FailLevel int ) create table #group_2_summary ( FirstGroupingId int , SecondGroupingId int , CheckId int , FailCount int ) create table #group_1_summary ( FirstGroupingId int , IsFailure bit ) -- add test data insert #base_data select 1, 1, 1, 'Bob', 0 union all select 2, 1, 2, 'Dave', 0 union all select 3, 1, 2, 'Alan', 0 union all select 4, 2, 1, 'Mary', 0 union all select 5, 2, 1, 'Sarah', 0 union all select 6, 2, 1, 'Charlie', 0 union all select 7, 2, 2, 'Caroline', 0 insert #check_data select 1, 'A', 1, 3 union all select 2, 'B', 2, 3 union all select 3, 'C', 4, 3 union all
select 4, 'D', 8, 3

Initially, I just populate the bitwise column in the base data using a case statement:

-- populate the individual bitwise values
-- edit: note that in real life this query is built dynamically and the case statements can be looking at any number of different fields within #base_data -- but this is omitted for simplicity ; with cte as ( select PrimaryId , BitwiseValue , case when charindex('A',TextData) > 0 then 1 else 0 end as Check1 , case when charindex('B',TextData) > 0 then 2 else 0 end as Check2 , case when charindex('C',TextData) > 0 then 4 else 0 end as Check3 , case when charindex('D',TextData) > 0 then 8 else 0 end as Check4 from #base_data ) update cte set BitwiseValue = Check1 + Check2 + Check3 + Check4 select * from #base_data
go

So far so good. Now I need to create a summary of fail counts for each test at SecondGroupingId level. I'm struggling to come up with a set based solution for this, so have deployed a cursor. I would really like to avoid the use of the cursor if at all possible:

-- populate the second grouped summary first
-- this needs to have an individual row for each combination of -- FirstGroupingId, SecondGroupingId and CheckId -- hence use of cursor/loop declare @fetch int , @CheckId int , @CheckBitwiseValue int declare curChecks cursor for select CheckId , CheckBitwiseValue from #check_data open curChecks fetch next from curChecks into @CheckId, @CheckBitwiseValue set @fetch = @@fetch_status while @fetch = 0 begin -- use cte to get round outer reference problem ; with cte as ( select FirstGroupingId , SecondGroupingId , count(*) as FailCount from #base_data where @CheckBitwiseValue & BitwiseValue = @CheckBitwiseValue group by FirstGroupingId , SecondGroupingId ) insert #group_2_summary select FirstGroupingId , SecondGroupingId , @CheckId , FailCount from cte fetch next from curChecks into @CheckId, @CheckBitwiseValue set @fetch = @@fetch_status end close curChecks deallocate curChecks
go

There are no crieria for failure at SecondGroupingId level individually. But, if the sum of failures for any check at FirstGroupingId level goes over the FailLevel for the check as defined in #check_data, then the FirstGroupingId set of data is deemed to have failed. This can be calculated relatively simply from the #group_2_summary table:

-- use the second group summary to populate the first group summary ; with cte as ( select FirstGroupingId , cd.CheckId , sum(FailCount) as TotalFailCount , FailLevel from #group_2_summary g2s join #check_data cd on g2s.CheckId = cd.CheckId group by FirstGroupingId , cd.CheckId , FailLevel ) , cte2 as ( select FirstGroupingId , CheckId , case when TotalFailCount >= FailLevel then 1 else 0 end as IsFailure from cte insert #group_1_summary select FirstGroupingId, max(IsFailure) from cte2
group by FirstGroupingId

So for this test data, I expect the results to be as follows:

select * from #group_1_summary /* FirstGroupingId IsFailure --------------- --------- 1 0 2 1 */ select * from #group_2_summary order by FirstGroupingId, SecondGroupingId /* FirstGroupingId SecondGroupingId CheckId FailCount --------------- ---------------- ----------- ----------- 1 1 2 1 1 2 1 2 1 2 4 1 2 1 1 3 2 1 3 1 2 2 3 1 2 2 1 1 */

Cleanup:

drop table #base_data
drop table #check_data drop table #group_1_summary
drop table #group_2_summary

It all just feels a bit lumpy and inelegant. Perhaps someone can suggest a better way?

Thanks, Iain

Edit: clarification

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply

irobertson Hall of Fame Points: 3695 More actions · Answer 1

irobertson

Hall of Fame

Points: 3695

July 10, 2012 at 2:36 pm

#1511214

Incidentally, I'm eliminating any rows where BitwiseValue = 0.

I've also thought that I might be able to use a combination of case statements in the cte with an unpivot in the insert to do the SecondGroupingId summary. Maybe this is a better way?

I've been looking at this for far too long, my head hurts... :unsure:

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 2

I'm getting lost in your requirements, but you definitely don't need the cursor.



; with basecte as (

	select FirstGroupingId,CheckID,faillevel

		, sum(case when charindex(CheckValue,TextData) > 0 then 1 end) status

	from #base_data cross join #check_data

	group by FirstGroupingId,CheckID,faillevel

	)

select FirstGroupingId, max(case when status>=faillevel then 1 else 0 end) from baseCTE group by FirstGroupingId

; with basecte as (

	select FirstGroupingId,SecondGroupingId,checkID

		,sum( case when charindex(CheckValue,TextData) > 0 then 1 end) status

	from #base_data cross join #check_data group by

	FirstGroupingId,SecondGroupingId,checkID

	)

select * from basecte where status is not null

Note: I am bypassing the bitwise operation, because it's not helping you. You've already got what you need to check a variable number of tests.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

irobertson Hall of Fame Points: 3695 More actions · Answer 3

Hi Matt,

Thanks for the reply. My example is heavily simplified. The procedure is to be used with an application that allows a user to select and execute a set of tests of their choice against a sample of data. These checks can be run against any one of a number of different columns. The tests are not limited to simple string comparisons. Perhaps I should have made this clearer in my original post, I was trying simplify what is a bit of a complex problem!

I have used the bitwise field because I have a requirement to allow a user to select rows from the master data (i.e. #base_data) that have failed any combination of tests, e.g. return all rows that failed tests 2,3 & 7. So it's either add lots of bit flags or a bitwise column and I prefer the bitwise approach.

The bit I really want to get rid of is the loop in the middle section. I'm going to have a look at the case/unpivot approach and see if I can make something work.

I'm open to suggestions as to other approaches.

Thanks Iain

irobertson Hall of Fame Points: 3695 More actions · Answer 4

As I thought. The case with unpivot option does the job much better:

-- unpivot



-- again, note that this will be built dynamically, but not shown here for clarity

select FirstGroupingId

		, SecondGroupingId

		, CheckId

		, FailCount

from (

	select FirstGroupingId

		, SecondGroupingId

		, sum(case when 1 & BitwiseValue = 1 then 1 else 0 end) as [1]

		, sum(case when 2 & BitwiseValue = 2 then 1 else 0 end) as [2]

		, sum(case when 4 & BitwiseValue = 4 then 1 else 0 end) as [4]

		, sum(case when 8 & BitwiseValue = 8 then 1 else 0 end) as [8]

	from #base_data

	group by FirstGroupingId

		, SecondGroupingId

	) src

unpivot (

	FailCount for CheckId in (

		[1],[2],[4],[8]

		)

	) unpvt

where FailCount > 0

order by FirstGroupingId, SecondGroupingId, CheckId

A clear demonstration of how it's often much better to step away from something than to attempt to plough through it. Well, that and the power of thinking about your problem in such a way as to be able to explain it to someone else.

I'm still open to suggestions as to approach though. If anyone would like to comment or improve on this feel free.

Thanks, Iain