Randomise sensitive data when copying database prod-dev or prod - QA

  • Hi,

    Our production database holds people's names, addresses and credit card numbers. When we copy the production data back to dev and QA, I would like to run a datawash script that re-arranges all this info, so that there is no real, complete data. For compliance we have put a simple change all member names, addresses to this single value, but this makes it hard to develop and troubleshoot because they are all the same.

    What I was thinking that the script/s would replace a frst name with a random first name from the same field, replace surname with random surname, replace address line 1 etc etc

    Has anyone done something like this before (I'm thinking this would take a few days to nut out, develop and test)? Does someone have a better datawash script?

    Thanks in advance.

    Steve

     

     

     

  • there is surprisingly little available. See:

    http://qa.sqlservercentral.com/forums/shwmessage.aspx?forumid=23&messageid=143718

  • I've done this for several different reasons... but it is basically the same...munge existing data to be unreleated to real client data.

    i've got a database of animal names, colors, and city names with views and functions to be used to update client data with meaningless stuff; the trick is you can't use a random or newid function directly from another function...you have to use a view to bypass the non-deterministic rule:

    try this:

    create view rand_CityName as

    select top 1 poname from master.dbo.zipcodes order by newid()

    CREATE VIEW Rand_Number

     AS SELECT RAND() AS r

    create function fn__randname()

    returns varchar(128)

            begin

              declare @return varchar(128)

              select top 1 @return= poname from rand_Cityname

              return @return

            end

    create function fn__randnumber

    (

    @MaxVal int

    )

    returns int

            begin

              declare @return int

              select top 1 @return = r * @MaxVal from Rand_Number

              return @return

            end

    select dbo.fn__randname()

    select dbo.fn__randnumber(4)

    select dbo.fn__randnumber(98989)

    update gmhome5 set name=dbo.fn__randname()

    munging an address would be something like

    update sometable set addr1= convert(varchar,dbo.fn__randnumber(9999)) + ' ' + dbo.fn__randcolor() + ' Street'

    so you get something like '4593 Aquamarine Street'

    the real work is determining which columns in your database need to be masked with fake data.

    HTH.

     

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • See this for a scrambler utility:

    http://qa.sqlservercentral.com/forums/shwmessage.aspx?forumid=8&messageid=123939#bm177681

    You can probably get an idea of how to modify it to do what you need.

    Maybe you'll get inspired to add a randomization option to it. ( GUID() works well for simulating random #s)

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply