Maintaining a live copy of data in a test database

  • Hi All,

    Im currently looking into a way of having a development database that has live (day old is fine) data in it. The caveat to the normal replication techniques and backup restore is that i need to be able to make schema changes to this database without those schema changes being made in the live database. The data coming from live has a lot of foreign key references so i need to keep the related data in tact when i bring them over, i cant just assign them completely new identity keys (well i could but i would have to change all the references to those ids in all the other tables).

    The only way i can think of achieving this at the moment is changing all the identity seeds in the development database to start at -1 and increment by -1, then write a custom script to copy all the data from live once a night. that way the script is a fairly simple bcp process as i dont have to do anything with the keys as long as i enable idenity insert into the dev database. I can happily make schema changes on the dev database without it affecting live and likewise data changes.

    Is there a simpler way of doing it?

    Of all the replication techniques i dont think any let you make schema changes to the subscriber database so they are out of the question. i dont think you can have a table that is being published that is also a subscriber so i cant run the dev database as the publisher database.

    Log shipping wont allow schema changes at the subscriber.

    Mirroring is one i havent looked into and probably should, would this give me the flexability that im after?!

    John

  • no one? 🙁

    Ive just started on the technique i described in my OP where im using manual scripts to copy the data across. Time consuming but i think it will work

  • Honestly, I think you're approaching your problem the wrong way. You shouldn't need live data *every* day to have a Dev database. And whether or not you script it or use one of the other techniques, you're wasting a lot of time each day to get this data updated in Dev.

    Do you have an SDLC (software development life cycle) at your workplace? If so, how often do you push changes up to QC / Production?

    If the answer is "every day," then you're not so much developing as fighting fires and being reactive in that way is not a good thing for your database.

    At my workplace, we have a development window that lasts about 4 weeks for coding, then we push up to Test/QA for a couple of weeks, then we push up to Prod. As one release moves out of Dev, we refresh Dev by running a restore database job which grabs the latest backup from Prod. Then we run the most recently approved Dev changes, and then we start developing the next release.

    I highly recommend this type of approach. It's better for your sanity.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Brandie Tarvin (8/4/2010)


    Honestly, I think you're approaching your problem the wrong way. You shouldn't need live data *every* day to have a Dev database. And whether or not you script it or use one of the other techniques, you're wasting a lot of time each day to get this data updated in Dev.

    Do you have an SDLC (software development life cycle) at your workplace? If so, how often do you push changes up to QC / Production?

    If the answer is "every day," then you're not so much developing as fighting fires and being reactive in that way is not a good thing for your database.

    At my workplace, we have a development window that lasts about 4 weeks for coding, then we push up to Test/QA for a couple of weeks, then we push up to Prod. As one release moves out of Dev, we refresh Dev by running a restore database job which grabs the latest backup from Prod. Then we run the most recently approved Dev changes, and then we start developing the next release.

    I highly recommend this type of approach. It's better for your sanity.

    Hi Brandie,

    I would love to be able to do that and its a very similar approach to what we did at my last place which worked really well. However the place i work now is much smaller and there is nowhere near as much red tape and release procedures as before so releases happen all the time, sometimes daily but more realistically i would say once a week, twice at most (on average).

    Another problem is that multiple applications run off one database, so one load of schema changes may get applied to production but a whole load more may get held back, so i quick backup restore is going to result in lost development changes.

    the reason i need live data to be refereshed on a daily basis is because often when testing the users will want to use the same data that got imported into live the day before, be it created invoices from imports, bank statement information through windows services etc. So to get that data the same i either have to replicate the data, or replicate the imports that put that data in live to also put the data in dev.

    Its a pretty weird scenario and i would love to be able to carry out the approach you use but i dont think ill get business buy in as it just wont work for us unfortunately 🙁

  • Animal Magic (8/4/2010)


    the reason i need live data to be refereshed on a daily basis is because often when testing the users will want to use the same data that got imported into live the day before, be it created invoices from imports, bank statement information through windows services etc. So to get that data the same i either have to replicate the data, or replicate the imports that put that data in live to also put the data in dev.

    Its a pretty weird scenario and i would love to be able to carry out the approach you use but i dont think ill get business buy in as it just wont work for us unfortunately 🙁

    You never know if you never ask them. Try to find something in this scenario that would cause them to buy in. Maybe ask them for a few days or a week latency instead of the 1 day. Make it sound to them like if you can have that window, it's better for them in the long run because you're spending more time working for them than you are pulling down data.

    If you absolutely have to do this, set up Transactional Replication down to your Dev environment to a non-Dev database. Back up THAT database every morning and restore it as your actual Dev database. That way, schema changes are flowing down from the Publisher, not being altered on the subscriber side. And you don't have to script things.

    EDIT: Run any held-back Dev changes against Dev db once you've restored it. It shouldn't take too long.

    It's a little ham-handed, but I think it's your best solution. And I really think you should ask for user buy-in on delaying the live data. The worst they can do is say "no" and if you give them the right motivations, they may surprise you and say "yes."

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Brandie Tarvin (8/4/2010)

    If you absolutely have to do this, set up Transactional Replication down to your Dev environment to a non-Dev database. Back up THAT database every morning and restore it as your actual Dev database. That way, schema changes are flowing down from the Publisher, not being altered on the subscriber side. And you don't have to script things.

    EDIT: Run any held-back Dev changes against Dev db once you've restored it. It shouldn't take too long.

    Thanks again for the reply Brandie. The approach above was the one i first thought of and the only thing i would have to do then is make sure that the devlopers are strict in scripting any schema changes into a folder of unreleased changes. currently they are not and just put things into dev/live without anyone checking, its a right mess. At any point in time they wouldnt know which schema changes were not yet live without doing a database compare.

    I have to work on someone, be it the developers or the users 🙂 I think the developers may be the easier option!

    John

  • Honestly I think you'll end up wasting tons of time moving this data. I only think that custom SSIS type scripts will work here, and since you are changing schema on the destination, it's possible that those scripts will break regularly. So you'll be developing scripts potentially more than code.

    No one does this, and the main reason is that you don't test in DEV because it's not a testing environment. IMHO, you ought to have a third, QA/test environment where you do restore production data AND then apply the schema/code changes from development that will roll out. This way you can make sure your changes don't break production. And you can then test against live data.

  • Steve Jones - Editor (8/4/2010)


    Honestly I think you'll end up wasting tons of time moving this data. I only think that custom SSIS type scripts will work here, and since you are changing schema on the destination, it's possible that those scripts will break regularly. So you'll be developing scripts potentially more than code.

    No one does this, and the main reason is that you don't test in DEV because it's not a testing environment. IMHO, you ought to have a third, QA/test environment where you do restore production data AND then apply the schema/code changes from development that will roll out. This way you can make sure your changes don't break production. And you can then test against live data.

    Hi Steve,

    I completely agree, i think a third environment will work best here. Ill put forward a proposal to the business whereby i restore the live data once a week and we have a folder of unreleased schema changes which are applied each monday morning.

    Once schema changes are released they are removed from the "unreleased schema changes" folder and placed in a "released schema changes" folder. That way each monday i only apply the neccasary changes. I dont think that will be too hard to implement and although its not a completely automated solution i agree that long term i will spend less time faffing about with custom t-sql scripts.

    Thanks all

  • As this site is a community service of Red Gate, I'm curious if you've tried Red Gate's SQL Compare and SQL Data Compare tools? I use them, and they're great for generating scripts for differential schema and also syncing data on a pushbutton basis (rather than a schedule), which sounds like how you need to operate.

    I can't imagine going back to doing these ad-hoc.

  • Red-Gate and ApexSQL both offer API-scriptable Data and Schema DIFF programs that you can use to automatically sync stuff like you are looking to do. I have to say though that any form of automated-every-day syncing like this will be more WAY trouble than it is worth I would bet. I strongly recommend you evaluate your 'need' for this level of freshness from prod back to dev...

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • thanks guys. ive used redgate sql compare before and trialled data compare but never used it in production. Ill have a look into that but will probably go down the route of a once a week restore then manually apply the change scripts.

    we are looking at getting a license thought for sql compare and im also trialling the sql source safe software too so it might be worth splashing out on the toolbelt if i can use data compare for this task 🙂

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply