Reverse Engineering Disasters

  • Many years ago, I was in a meeting with a CIO and his AS/400 expert. The AS/400 guy went on how the 400 never goes down.....

    Two days later, the AS/400 crashed.

    On the SQL Server side, I had already setup fully automated backups with testing.

    The more you are prepared, the less you need it.

  • Well when I walked into the server at my last company for the first time I looked around. Then I looked up. They had water sprinkler as a fire suppression system. I told them that it was a bad idea to mix servers and water. So about four months later they had the sprinklers removed. But the didn't replace the system with a chemical suppression system for six years. I was just stunned at the stupidity of management.



    ----------------
    Jim P.

    A little bit of this and a little byte of that can cause bloatware.

  • I like the editorial, it's presenting a quick view of an important issue, but I don't like the referenced article "What would it take..." because it doesn't look at cost effectiveness. Nor does it look at how risk averse stake holders are or are not.

    One of the important things in designing most systems is cost effectiveness. If the value of my data is 5 million dollars per year I can't afford to spend a hundred thousand dollars per week on protecting it - the net value would then be less than 0 per year. So absolute values and costs have to be considered. But that doesn't go far enough, absolutes may not be adequate on their own.

    Often the aim is to maximise probable net value, so that expected net value has to be considered rather than worrying only about absolutes; so when I look at the value of the data and a loss scenario, I need to look at the probability of that scenario as well as the cost of potecting against it, and that is a hugely complex thing to do because loss scenarios may be mutually exclusive, or independent, or somewhere in between and that all affects the expected value of protecting against combinations of them. There's also a question of philosophy here; should one take only protective actions with positive expected value, or also individual actions with zero expected value, or even individual actions with small negative expected value? Is it enough to compute what sets of actions will deliver maximum expected net value over time (over how much time?), always assuming that that computation isn't so complex as to be impractical? Probably it isn't - how risk averse are the people who lose or gain by the decision? If they are very risk averse some disaster prevention actions with negative expected net worth will be acceptable; if they are very risk accepting they may want to omit some actions with zero probable net value (because omitting them increases ROCE and although it increases risk it doesn't change expected outcome) and perhaps even omit some actions with positive expected net value.

    Gary's comments above address this point to some extent - a disaster that is unlikely to happen and isn't at all severe comes out at 1 on his scale, so recovery wouldn't be implemented, which is probably reasonable behaviour. But it doesn't go far enough; back in October 1962 the we had a period when the reasonable estimate pf probability of losing every data centre and all staff outside of Polynesia and Central Africa was perhaps 0.4, which is maybe a score of 4 on his probability scale and losing all data completely must surely be severity 5, giving a score of 20 which is well within his "do this" zone so at that time everyone should have been building data centres in thos places and arranging for backup copies to be shipped there regularly along with training staff and basing them there; but the cost would have been enormous and the benefit tiny (because there would be no market left to do business in if the 0.4 probability event happened) so the expected net worth was zilch which wouldn't repay the cost, so there was no point in doing it. Maybe that example comes under a Ragnarok exception to Gary's scoring, butI think it's possible to construct less extreme examples that demonstrate the importance of computing net worth (aka cost effectiveness).

    Tom

  • Tom, there is nothing to stop a company documenting that they cannot find a cost effective solution and are prepared to take the risk. They just can't ignore it if they follow the process. It makes someone responsible which makes it more likely to be planned for.

    Gaz

    -- Stop your grinnin' and drop your linen...they're everywhere!!!

Viewing 4 posts - 16 through 18 (of 18 total)

You must be logged in to reply to this topic. Login to reply