scramble data

  • Anyone has Procedure to scramble data like addresses, but the new data should not be any characters... address1 should look like address1 field and so on? Thanks!

  • Ghanta (7/27/2011)


    Anyone has Procedure to scramble data like addresses

    Do you mean column encryption? http://msdn.microsoft.com/en-us/library/ms179331(v=SQL.100).aspx

    Ghanta (7/27/2011)


    but the new data should not be any characters... address1 should look like address1 field and so on?

    Or do you mean replacing a real address with a dummy address, in order to use a real database for testing, so the Developers don't see real addresses?

    BrainDonor

    Steve Hall
    Linkedin
    Blog Site

  • Ghanta (7/27/2011)


    Anyone has Procedure to scramble data like addresses, but the new data should not be any characters... address1 should look like address1 field and so on? Thanks!

    Will using one customer's address in place of another's do? In other words, row-to-row scrambling?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • a lot of scrambling of data is specific to the tables and lookups invloved;

    I've got something that replaces all existing data with something like this (street names are a random html color) :

    addr1 addr2 addr3

    7312 RosyBrown Street Building 384 Unit 1029

    1363 DarkSlateBlue Street Building 208 Unit 1746

    1094 DarkGoldenrod Street Building 468 Unit 1651

    3770 Navy Street Building 172 Unit 2049

    2443 MintCream Street Building 201 Unit 7170

    but the code assumes a lot of stuff about specific columns and foreign keys, since it's designed around my data.let me see if i can't dump it into a proc to make it more universal,

    but here's a prototype:

    UPDATE cmcontact

    set busemail = 'notreal@email.com',

    editedby = 'Administrator',

    busfax = '9542179598',

    busphone = '8883267680',

    buszipcode = '40000',

    title = ''

    --random first land last names, based on colors and 6 first names:

    ;With MyColorCTE AS

    ( SELECT 'AliceBlue' AS TheColor UNION ALL

    SELECT 'AntiqueWhite' UNION ALL SELECT 'Aqua' UNION ALL SELECT 'Aquamarine' UNION ALL

    SELECT 'Azure' UNION ALL SELECT 'Beige' UNION ALL SELECT 'Bisque' UNION ALL SELECT 'Black' UNION ALL

    SELECT 'BlanchedAlmond' UNION ALL SELECT 'Blue' UNION ALL SELECT 'BlueViolet' UNION ALL SELECT 'Brown' UNION ALL

    SELECT 'BurlyWood' UNION ALL SELECT 'CadetBlue' UNION ALL SELECT 'Chartreuse' UNION ALL SELECT 'Chocolate' UNION ALL

    SELECT 'Coral' UNION ALL SELECT 'CornFlowerBlue' UNION ALL SELECT 'Cornsilk' UNION ALL SELECT 'Crimson' UNION ALL

    SELECT 'Cyan' UNION ALL SELECT 'DarkBlue' UNION ALL SELECT 'DarkCyan' UNION ALL SELECT 'DarkGoldenrod' UNION ALL

    SELECT 'DarkGray' UNION ALL SELECT 'DarkGreen' UNION ALL SELECT 'DarkKhaki' UNION ALL SELECT 'DarkMagenta' UNION ALL

    SELECT 'DarkOliveGreen' UNION ALL SELECT 'DarkOrange' UNION ALL SELECT 'DarkOrchid' UNION ALL SELECT 'DarkRed' UNION ALL

    SELECT 'DarkSalmon' UNION ALL SELECT 'DarkSeaGreen' UNION ALL SELECT 'DarkSlateBlue' UNION ALL SELECT 'DarkSlateGray' UNION ALL

    SELECT 'DarkTurquoise' UNION ALL SELECT 'DarkViolet' UNION ALL SELECT 'DeepPink' UNION ALL SELECT 'DeepSkyBlue' UNION ALL

    SELECT 'DimGray' UNION ALL SELECT 'DodgerBlue' UNION ALL SELECT 'FireBrick' UNION ALL SELECT 'FloralWhite' UNION ALL

    SELECT 'ForestGreen' UNION ALL SELECT 'Fuchsia' UNION ALL SELECT 'Gainsboro' UNION ALL SELECT 'GhostWhite' UNION ALL

    SELECT 'Gold' UNION ALL SELECT 'Goldenrod' UNION ALL SELECT 'Gray' UNION ALL SELECT 'Green' UNION ALL

    SELECT 'GreenYellow' UNION ALL SELECT 'Honeydew' UNION ALL SELECT 'HotPink' UNION ALL SELECT 'IndianRed' UNION ALL

    SELECT 'Indigo' UNION ALL SELECT 'Ivory' UNION ALL SELECT 'Khaki' UNION ALL SELECT 'Lavender' UNION ALL

    SELECT 'LavenderBlush' UNION ALL SELECT 'LawnGreen' UNION ALL SELECT 'LemonChiffon' UNION ALL SELECT 'LightBlue' UNION ALL

    SELECT 'LightCoral' UNION ALL SELECT 'LightCyan' UNION ALL SELECT 'LightGray' UNION ALL SELECT 'LightGreen' UNION ALL

    SELECT 'LightPink' UNION ALL SELECT 'LightSalmon' UNION ALL SELECT 'LightSeaGreen' UNION ALL SELECT 'LightSkyBlue' UNION ALL

    SELECT 'LightSlateGray' UNION ALL SELECT 'LightSteelBlue' UNION ALL SELECT 'LightYellow' UNION ALL SELECT 'Lime' UNION ALL

    SELECT 'LimeGreen' UNION ALL SELECT 'Linen' UNION ALL SELECT 'Magenta' UNION ALL SELECT 'MidnightBlue' UNION ALL

    SELECT 'Maroon' UNION ALL SELECT 'MediumAquamarine' UNION ALL SELECT 'MediumBlue' UNION ALL SELECT 'MediumVioletRed' UNION ALL

    SELECT 'MediumOrchid' UNION ALL SELECT 'MediumTurquoise' UNION ALL SELECT 'MediumPurple' UNION ALL SELECT 'MediumSpringGreen' UNION ALL

    SELECT 'MediumSeaGreen' UNION ALL SELECT 'MediumSlateBlue' UNION ALL SELECT 'MintCream' UNION ALL SELECT 'MistyRose' UNION ALL

    SELECT 'Moccasin' UNION ALL SELECT 'NavajoWhite' UNION ALL SELECT 'Navy' UNION ALL SELECT 'OldLace' UNION ALL

    SELECT 'Olive' UNION ALL SELECT 'OliveDrab' UNION ALL SELECT 'Orange' UNION ALL SELECT 'OrangeRed' UNION ALL

    SELECT 'Orchid' UNION ALL SELECT 'PaleGoldenrod' UNION ALL SELECT 'PaleGreen' UNION ALL SELECT 'PaleTurquoise' UNION ALL

    SELECT 'PaleVioletRed' UNION ALL SELECT 'PapayaWhip' UNION ALL SELECT 'PeachPuff' UNION ALL SELECT 'Peru' UNION ALL

    SELECT 'Pink' UNION ALL SELECT 'Plum' UNION ALL SELECT 'PowderBlue' UNION ALL SELECT 'Purple' UNION ALL

    SELECT 'Red' UNION ALL SELECT 'RosyBrown' UNION ALL SELECT 'RoyalBlue' UNION ALL SELECT 'SaddleBrown' UNION ALL

    SELECT 'Salmon' UNION ALL SELECT 'SandyBrown' UNION ALL SELECT 'SeaGreen' UNION ALL SELECT 'Seashell' UNION ALL

    SELECT 'Sienna' UNION ALL SELECT 'Silver' UNION ALL SELECT 'SkyBlue' UNION ALL SELECT 'SlateBlue' UNION ALL

    SELECT 'SlateGray' UNION ALL SELECT 'Snow' UNION ALL SELECT 'SpringGreen' UNION ALL SELECT 'SteelBlue' UNION ALL

    SELECT 'Tan' UNION ALL SELECT 'Teal' UNION ALL SELECT 'Thistle' UNION ALL SELECT 'Tomato' UNION ALL

    SELECT 'Turquoise' UNION ALL SELECT 'Violet' UNION ALL SELECT 'Wheat' UNION ALL SELECT 'White' UNION ALL

    SELECT 'WhiteSmoke' UNION ALL SELECT 'Yellow' UNION ALL SELECT 'YellowGreen' )

    ,CTERANDOMLYNUMBER AS

    (

    SELECT CONTACTTBLKEY,

    TheColor,

    ROWNUM = ROW_NUMBER() OVER (PARTITION BY CONTACTTBLKEY ORDER BY NEWID())

    FROM CMCONTACT --THE TABLE WITH THE KEY

    CROSS JOIN MyColorCTE --THE TABLE WITH THE RANDOM VALUE

    )

    UPDATE CMCONTACT --THE JOINING TABLE

    SET CMCONTACT.LASTNAME = CTERANDOMLYNUMBER.TheColor --THE RANDOM VALUE

    FROM CTERANDOMLYNUMBER

    WHERE CMCONTACT.CONTACTTBLKEY=CTERANDOMLYNUMBER.CONTACTTBLKEY --FOR EACH KEY

    AND ROWNUM = 1 --LIMITS TO ONE ARBITRARY VALUE DUE TO THE ORDER BY NEWID

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Jeff Moden (7/28/2011)


    Ghanta (7/27/2011)


    Anyone has Procedure to scramble data like addresses, but the new data should not be any characters... address1 should look like address1 field and so on? Thanks!

    Will using one customer's address in place of another's do? In other words, row-to-row scrambling?

    Thanks Jeff. Yes that will work and I am thinking I will change the numbers on that addresses...

  • Yeah Lowell thanks. I was checking if someone has something generic for this.

  • BrainDonor (7/28/2011)


    Ghanta (7/27/2011)


    Anyone has Procedure to scramble data like addresses

    Do you mean column encryption? http://msdn.microsoft.com/en-us/library/ms179331(v=SQL.100).aspx

    Ghanta (7/27/2011)


    but the new data should not be any characters... address1 should look like address1 field and so on?

    Or do you mean replacing a real address with a dummy address, in order to use a real database for testing, so the Developers don't see real addresses?

    BrainDonor

    BrainDonor I meant replacing a real address with a dummy address like you said.

  • Ghanta (7/28/2011)


    Jeff Moden (7/28/2011)


    Ghanta (7/27/2011)


    Anyone has Procedure to scramble data like addresses, but the new data should not be any characters... address1 should look like address1 field and so on? Thanks!

    Will using one customer's address in place of another's do? In other words, row-to-row scrambling?

    Thanks Jeff. Yes that will work and I am thinking I will change the numbers on that addresses...

    That works for me. This is about as generic a method as you can get. As always, most of the details are in the code.

    --=====================================================================================================================

    -- Setup some "burnable" test data to demonstrate on.

    -- Nothing in this section is a part of the solution. We're just building test data.

    --=====================================================================================================================

    --===== Work in a nice safe place that everyone has

    USE TempDB

    ;

    --===== Conditionally drop the test table to make reruns in SSMS easier.

    IF OBJECT_ID('tempdb..#TestTable','U') IS NOT NULL

    DROP TABLE #TestTable

    ;

    --===== Create and populate the test table on-the-fly.

    -- We use a well know source for this demonstration.

    SELECT AddressID, AddressLine1

    INTO #TestTable

    FROM AdventureWorks.Person.Address

    ;

    --===== Add the likely Primary Key to the test table.

    -- Don't use named constraints on a temp table because of concurrency issues.

    -- No problem on tests like this but will cause problems in Production.

    ALTER TABLE #TestTable

    ADD PRIMARY KEY CLUSTERED (AddressID)

    ;

    --=====================================================================================================================

    -- The section solves the problem by randomly changing all the addresses to some other address in the table.

    --=====================================================================================================================

    --===== Randomly "Scramble" the addresses to another ID

    WITH

    cteEnumerateStraight AS

    (--==== This creates the original non-randomized "sort" of AddressLine1

    SELECT RowNum = ROW_NUMBER() OVER (ORDER BY AddressID),

    AddressID,

    AddressLine1

    FROM #TestTable

    ),

    cteEnumerateRandom AS

    (--==== This creates a random "sort" of AddressLine1

    SELECT RowNum = ROW_NUMBER() OVER (ORDER BY NEWID()),

    AddressLine1

    FROM #TestTable

    )--==== This changes the "straight" address to one of the "random" ones.

    -- All addresses are affected because of the uniqueness of ROW_NUMBER()

    -- and the fact both RowNum columns were built from the same table.

    -- I say "all addresses are affected" but there is the random chance

    -- that the random address will be the original. You could check for

    -- such a thing by comparing the original table against the Dev table.

    UPDATE tgt

    SET AddressLine1 = rnd.AddressLine1

    FROM cteEnumerateStraight tgt

    INNER JOIN cteEnumerateRandom rnd

    ON tgt.RowNum = rnd.RowNum

    ;

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • p.s. Please post back if you have any difficulty in randomizing the numerics in each address.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Ah... almost forgot. I got your email. I know you were probably "under the gun" for this task but I don't normally have the time to respond with code while I'm at work. Ya gotta be patient with me. 😉 Sometimes, it's even worth the wait. 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Thanks a lot Jeff. This will help me save a lot of time. Thanks to Lovell as well as I can use his approach for names and other fields. I will be able to generate random numbers for numeric data. Thanks again.

  • Jeff Moden (7/28/2011)


    Ah... almost forgot. I got your email. I know you were probably "under the gun" for this task but I don't normally have the time to respond with code while I'm at work. Ya gotta be patient with me. 😉 Sometimes, it's even worth the wait. 😛

    Yes it is worth a wait. Thanks! I had worked on a script that would place the addresses into a temporary table and then I would randomly assign the addresses, but that did not guarantee new addresses for everyone. Your script helps me out rite away. Thanks again.

  • Ghanta (7/28/2011)


    ... but that did not guarantee new addresses for everyone. Your script helps me out rite away. Thanks again.

    You're welcome and thanks for the feedback but, like I said in the code, my method doesn't "guarantee" new addresses for everyone. There's the random chance that an address or two could randomly be assigned back to its original spot. You need to write a check against the original data to ensure that hasn't happened. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Sure Jeff I will do that. Thanks.

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply