Hashing a row in SQL.

  • Every month I receive a large file from a vendor, and the first step in the ETL process is to determine if any record in the file is identical to a record that we've been sent in the past. Solution is to import it into a working table in the database and then perform an exact duplicate check (all columns identical in both tables) to the final target table.

    Although I can script out a huge column-to-column test for a WHERE clause or a CASE expression, it would seem more performant to generate a hash value to either store in a separate table or to index on. However, the target table has dozens of columns and the row size exceeds 8000, so using hashbytes() is right out. I've thought about hashing subsets of the columns and then hashing the resulting hash values, but i fear producing false positives.

    Can anyone tell me how they handle this kind of problem? Any constructive suggestions or alternatives would be welcome.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Unless I'm missing something, I think using EXISTS would be a very approachable solution.

    http://technet.microsoft.com/en-us/library/ms188336.aspx

  • SELECT CHECKSUM(*) FROM StagingTable

    Collisions are possible, so you'll have to do a column-by-column check over any matches, but should be faster. The main table can have a computed column with the checksum, with an index on it for faster matching.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks 🙂

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply