Integer bigger than bigint?

  • QUESTION:

    Is there any way to use an integer data type larger than a (signed) bigint in SQL Server 2008?

    PROBLEM:

    I have an "item" that can belong to N number of "categories". To implement this, I've created a "category" bitmask that's assigned to each "item". I then perform bitwise operations to determine which "categories" an "item" belongs to. The "category" list must be dynamic (iow, you can add "categories" at runtime). With a signed bigint, this only allows me 63 "categories" (2^63-1)...I need more.

    NOTES:

    1. From what I've seen, you must use an integer data type to perform bitwise operations (which makes perfect sense). e.g., i couldn't use a numeric(38,0).

    2. I don't want to just create a bigint rule to make it unsigned. That would still leave me with a 63-bit integer.

    3. I'm open to suggestions on alternate ways to implement this, although everything I've thought of so far seems like it would cause a massive performance hit (like introducing a table with each item/category combination).

    Thanks so much for your input!

  • We use three tables for this:

    an Item table with pk ItemID

    a Category table with pk CategoryID

    an ItemCategory table with pkItemCategory and fk's ItemID and CategoryID

    There are no limits (to us) on how many categories per item and vice versa.

    There's no developer overhead of dicking around trying to figure out what the hell is associated with what, you can see it all unfold with a simple query.

    It uses bogstandard INNER and LEFT JOINS instead of iBitmap purple blogsquabs or whatever.

    I won't have to worry about leaving my house on a dark night in three years' time and being whacked over the head with something heavy and blunt by a developer driven over the edge by a design idea which seemed fancy schmancy at the time but which sucked in practice.

    Of course you can do it, if you really really want to.

    Edit: How about giving this <<a massive performance hit (like introducing a table with each item/category combination)>> a wee try with some sample data?


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • Bitwise operators can operate on varbinary as the left operand; only the mask ( the right operand) has to be one of the integer types. To quote SQLS 2008 Books Online, "The operands for bitwise operators can be any one of the data types of the integer or binary string data type categories (except for the image data type), except that both operands cannot be any one of the data types of the binary string data type category."

  • I'm trying to make it so that you can select N number of categories in the UI and the program will return all the items that match either [any] or [all] of those categories (depending on what the user specifies). When searching with the [all] option, the only way I see to do this with the ItemCategory table is to first pivot the table and then search explicitly for the desired categories, or, to dynamically create a statement with the necessary joins. Seems to me that either option would be a significant performance drag as well as significantly more complicated that a single bitwise operation. Unless I'm missing something?

    Thanks again!

  • i have to go with lobbymunchers suggestion; a standard parent child relationship would be much more appropriate; i think you are mentally locked into bitwise operations because it's neat to do, but since you have more objects to track than bits, you don't want to let go of an inappropriate model.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • @lowell:

    I'm certainly not locked into any single way of doing it. As I stated in my previous post, the only implementations I can think of with that approach are less efficient and more complicated. So that's why I wanted to see if I could get around this roadblock. If a bigint is as large of an integer as I can use in SQL Server, then so be it, I'll have to do it the "normal" way. That's what I'm trying to find out with this post.

  • One more nod in agreement with Lowell and Lobbymuncher

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • If you do have to use a categories table (and I agree that you do), how about something like this,

    which uses a categories table, but with a sort of mask to select the result....It is certainly not complicated,

    but possibly not super speedy - I am sure someone could provide a quicker solution -

    this was intended as an indication that it need not be complicated to work with a categories table.

    You would generate your list of selectable categories in the application from the "categories" table and create a comma separated list of selected categories (their ids) to pass to SQL.

    --= declare a variable to hold the list of selected categories (the id column from the categories table)

    --= and set a value corresponding to the user's choice of categories (category1 and category4)

    declare @category_list varchar(1000) -- big enough for quite a few categories ?

    set @category_list = '1,4,5'

    --= declare a variable to hold the user's choice of ALL or ANY matches

    declare @all bit

    set @all = 0

    --= CTE to generate some tables

    ;with categories(id,category)

    as (

    select 1,'category1' union all

    select 2,'category2' union all

    select 3,'category3' union all

    select 4,'category4' union all

    select 5,'category5'

    )

    , item_categories(id,itemid,categoryid)

    as (

    select 1,1,1 union all

    select 2,1,4 union all

    select 3,1,5 union all

    select 4,2,1

    )

    , items(id,description)

    as (

    select 1,'Item 1' union all

    select 2,'Item 2'

    )

    --= Now the actual query

    select

    [id],

    [description],

    case match.list

    when '' then 'No match'

    when @category_list then 'Exact Match'

    else 'Partial Match'

    end as MatchStatus

    from items

    cross apply (

    select STUFF(

    (

    select ','+convert(varchar,categoryid)

    from item_categories

    where itemid = items.id and charindex(','+convert(varchar,categoryid)+',',','+@category_list+',')>0

    for xml path('')

    ),1,1,'')

    ) as match(list)

    where (@all = 0 AND match.list<>'') OR (@all = 1 AND match.list = @category_list)

    As long as you don't have millions of categories, and you index correctly, something like this should be workable.

    Try setting different values for @category_list and @all (1=match all categories, 0=match any)

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • @mister.magoo:

    Thanks for the effort, much appreciated!

    What you have though is precisely my point...while I agree that it's not overly complicated, it's certainly "more" complicated, and i would think significantly less efficient than a single bitwise operation. Everyone seems to think the bitwise approach is too "fancy", but I think what you have is a lot "fancier" than a single condition of "@ItemBitmask & @SelectedBitmask = @SelectedBitmask".

    I'm not opposed to an approach such as yours, and it looks like I'll have to go with something along those lines, but I was hoping I could go with what I still perceive to be the simpler and quicker way. So if anyone is aware of how to get at an integer larger than bigint, I'd still love to know about it 🙂

    Thanks again for all the input.

  • You're welcome

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Revenant (1/21/2010)


    Bitwise operators can operate on varbinary as the left operand; only the mask ( the right operand) has to be one of the integer types. To quote SQLS 2008 Books Online, "The operands for bitwise operators can be any one of the data types of the integer or binary string data type categories (except for the image data type), except that both operands cannot be any one of the data types of the binary string data type category."

    That still means a limit on the mask, though.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Stephen Fry (1/21/2010)


    @mister.magoo:

    Thanks for the effort, much appreciated!

    What you have though is precisely my point...while I agree that it's not overly complicated, it's certainly "more" complicated, and i would think significantly less efficient than a single bitwise operation. Everyone seems to think the bitwise approach is too "fancy", but I think what you have is a lot "fancier" than a single condition of "@ItemBitmask & @SelectedBitmask = @SelectedBitmask".

    I'm not opposed to an approach such as yours, and it looks like I'll have to go with something along those lines, but I was hoping I could go with what I still perceive to be the simpler and quicker way. So if anyone is aware of how to get at an integer larger than bigint, I'd still love to know about it 🙂

    Thanks again for all the input.

    Bit masking will always have the problem of being non-sargeable for table lookups which also means it will also be comparatively slow. A normalize associative table with one column for the item and one column to hold one category (you'd need one row for each item/category) would be highly performant even in the presence of "ORs".

    If you still like the idea of denormalizing (for performance reasons, you really shouldn't) the table by having a single column with multiple categories, you could use a binary column and use character conversions to do "byte slicing" in a fashion similar (but not identical to) the byte slicing they explain in Books Online to determine which columns have been updated in triggers.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Viewing 12 posts - 1 through 11 (of 11 total)

    You must be logged in to reply to this topic. Login to reply