Binary_Checksum

  • Hi all

    Does anyone have an explanation as to why Binary_checksum(98000.00) would be the same as Binary_Checksum(980000.00). It is not the same if it is an Integer. I know there are disclaimers for case sensitive strings, but I would expect numbers to be processed correctly...

    Thanks

    Michael..

  • Here's what the BOL says:

    "BINARY_CHECKSUM(*), computed on any row of a table, returns the same value as long the row is not subsequently modified. BINARY_CHECKSUM(*) will return a different value for most, but not all, changes to the row, and can be used to detect most row modifications."

    So, I would say that the function works as advertised.   The checksum is not guaranteed to be unique across any possible values that it is supplied, but that's true of any checksum. 

    It should probably work better, but it doesn't.  In any case, where you would use a hash value to compare two pieces of data, you still have to check the actual data after a checksum match.

    The short answer is that the checksum over numeric values appears to only uses the mantissa, and not the exponent in its calculations.

     

     

     

  • That is what I would call 'False Advertising'.....I am planning to change my process to use a Hashing algorithm via HASHBYTES.

    Thanks for the info....

  • By the way, the explanation does not stand for string data.

    The following also produces the same checksum...

    SELECT CHECKSUM('aa'),

    CHECKSUM('bQ')

    IN SHORT......DO NOT USE for the purpose intended......

    It would be almost funny if it was not so sad....

  • I'll add that I experimented with the checksum functions and am disappointed. Something like CRC would be much better, and even a simple shift-and-add would work much better.

    SELECT

    BINARY_CHECKSUM(0x0101)

    17

    SELECT BINARY_CHECKSUM(0x010101)

    273

    As you can see, the formula seems to be multiply by 16 and add. This works poorly for many strings, I've noticed, resulting in many collisions. If they used multiply by 17 it would work much better as a hash function.

    However, I too find it strange that the checksum is the same for the two values mentioned, considering that their binary representations are substantially different. I guess the function must scale the argument to eliminate the mantissa before hashing the bytes.

    SELECT

    CAST(98000.00 AS VARBINARY)

    0x0702000140899500

    SELECT

    CAST(980000.00 AS VARBINARY)

    0x08020001805CD705

    SELECT

    BINARY_CHECKSUM(98000.00)

    -1810540209

    SELECT

    BINARY_CHECKSUM(980000.00)

    -1810540209

    SELECT

    BINARY_CHECKSUM(CAST(98000.00 AS VARBINARY))

    119560197

    SELECT

    BINARY_CHECKSUM(CAST(980000.00 AS VARBINARY))

    -2113318539

     

    Hashing seems to proceed from least-significant byte to most-significant byte....I can't quite seem to decode the algorithm this moment, though.

    SELECT

    BINARY_CHECKSUM(CAST(4294967296*4294967296*4294967296 AS DECIMAL(36,2)))

    1000000000

     

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply