Halloween Protection

  • L' Eomot Inversé (10/31/2012)


    Meow Now (10/31/2012)


    I stumbled across Freedman's explanation in my search on the subject and was thrown by the options as well. Either way, I selected the UPDATE choice as the most likely candidate.

    UPDATE: Found the patent and the update is referring to an update plan (any data modification).

    http://www.google.com/patents/US6122644

    Isn't life wonderful. Here we have a problem identified by IBM back in the 70s, and solved by them back in the 70s, back in the days when the US sopreme Court was still sufficiently sane to assert that software patents were illegal. Then MS gets a patent on solving it in 2000, with a patent that has no non-obvious invention in it (at least nothing non-obvious to anyone familiar with the state of the art in, I think, the late 70s) after the law has been changed (by magic, or perhaps by the appointment of supreme court judges favoured by presidents whose campaign funds came from industrialists who wanted software patents).

    We all knew the US patent system had become insane some time ago. What none of us outside the US know is why you pretend that your plutocracy is a democracy, claiming that the people rule when in fact only the money rules, as clearly evidenced by the changes in the last few decades to your copyright system and your patent system (amongst many other indicators).

    Gosh. I would hope that people know that it's not a Democracy. It's a Republic (well, it's supposed to be... I think "plutocracy" is probably more accurate in this day and age). Even says so right in the "Pledge of Allegiance".

    All that aside and whatever you want to call it and whatever problems we may have, I'm pretty happy that I live here. 😉 So.... how about you take it just a little easier on my country, please?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Guys why so serious, and Jeff, I am glad you´re happy living in US, a great country indeed.

    -------------------------------------------------------------
    "It takes 15 minutes to learn the game and a lifetime to master"
    "Share your knowledge. It's a way to achieve immortality."

  • Jeff Moden (11/1/2012)


    Gosh. I would hope that people know that it's not a Democracy. It's a Republic (well, it's supposed to be... I think "plutocracy" is probably more accurate in this day and age). Even says so right in the "Pledge of Allegiance".

    Nothing stops it being both a republic and a democracy except that it has become a plutocracy so it isn't a democracy. Half a century ago it was a republican democracy, a genuine democratic republic (have to include the word "genuine" there, as the phrase "democratic republic" without the word "genuine" has been horribly tainted by all the totalitarian governments who called their countries that despite being neither democratic nor a republic) but now it's fast becoming a plutocratic republic.

    Of course the words "republican" and "democrat" now have a very specific party-political meaning in the USA and their use in that party-political sense no longer has much (if any) connection with the normal meanings of those words, but their use in that sense isn't designed to mislead as was their use by totalitarians.

    All that aside and whatever you want to call it and whatever problems we may have, I'm pretty happy that I live here. 😉 So.... how about you take it just a little easier on my country, please?

    Oh, I rather like the USA myself. It has a lot of good points, a lot going for it. But I don't take it easy on people/things that I like and care about - that would be an abdication of the obligations of a friend - I speak up when there's something wrong.

    So I'll continue to rail against the ongoing drift into plutocracy, the ridiculous extensions of copyright, the contemptible patent system, and the disgraceful excesses of the immigration service - much as I have always railed against the numerous faults in my own country.

    Tom

  • L' Eomot Inversé (11/1/2012)


    Of course the words "republican" and "democrat" now have a very specific party-political meaning in the USA...

    Tom, I'm very sorry, but I must correct you. it's "republican" and "democratic" not "democrat" as the party is the "Democratic Party".

  • Interesting discussion. Freedman does mention inserts and deletes, but he never wrote on the subject. Perhaps he was mistaken, perhaps he hasn't had time. The KB also mentions it, but it could be standard wording and not specific application to inserts/deletes.

    For now, I will stand by Halloween protection being known as applying to updates.

  • L' Eomot Inversé (10/31/2012)


    This probably a pointless argumnent, because it isn't about anything technical, just about historical and terminological accuracy, and it does appear that quite a few people have adopted historically inaccurate terminology (which is a pretty frequent phenomenon - for example we call Heath's normal form, published in 1971, "Boyce-Codd Normal Form" after a paper published several years later, although the earlier paper was certainly known to the senior of the authors of the later paper). But I'm going to make the argument for historical accuracy (especially since several of the people involved are still alive) anyway (if I were in touch with any of them I would ask them for their views, but unfortunately I'm not).

    Well, perhaps there's one thing we can agree on.

    Without getting word "from the horses mouth", it's impossible to know for certain what the IBM team considers to be in the domain of the Halloween problem.

    All your transcript link provides is a recollection of the circumstances in which it was discovered.

    There are a number of unanswered possibilities:

    • When the problem was discovered, did the team consider INSERT / DELETE scenarios at all?
    • Even if they hadn't originally considered the possibility of INSERT / DELETE exhibiting similar unexpected behaviour for the same reason, would they consider those to be a different problem?
    • Did the team consider the Halloween problem as exclusively applicable to the UPDATE statement?
    • Or perhaps even more strictly, did they consider it to be the specific problem of updating Salaries over the Salaries index?
    • When they use the word updates, do they mean changing existing records (UPDATE statement), or any update that changes the state of the database?

    I will however, give you something to consider.

    Languages, words and names are in continuous evolution to aid communication. Believe me, you're not doing yourself any favours if you reject a logical definition on an historical 'point of order' - regardless of the accuracy of your claim.

    Nevertheless, I'm willing to agree to disagree.

    L' Eomot Inversé (10/31/2012)


    craig 81366 (10/31/2012)


    Then why when I run the following by itself, does the plan not spool?

    SELECT AVG(COL) FROM TABLE_NAME

    Because a displayed MS SQL query plan only shows spool operations when they are not blatantly obvious (which I think is eminently sensible); no other reason.

    Whereas I can agree to disagree on the previous point, here I must step in and call a spade a spade.

    I can only assume your insistence on pursuing this is that you don't fully understand what a spool operation does. It effectively reads all relevant data into temporary storage before proceeding to the next step. This isolates the data from changes that could be effected in said step. It can also be a relatively expensive operation, and so is not nearly as trivial as you seem to think.

    Not pulling punches: I consider your claim to be utterly unsubstantiated, unfounded rubbish!

    If "blatantly obvious" operations can simply be ignored, where do you draw the line? Are there some hash joins, or nested lookups that are "blatantly obvious"?

    The fact is you do not need to spool (first read all data into temporary storage) to calculate an average.

    I apologise if you consider the elaborate explanation (below) of how to calculate an average to be beneath you, but please observe that you can perform the calculation as you read data. You do not need to spool it first.

    • Read first record (Val=10)
    • Increment record counter(1), Add Val to cumulative total (0+10=10)
    • Read next record (Val=20)
    • Record Counter=2, Cumulative Total=30
    • Read next record (Val=30)
    • Record Counter=3, Cumulative Total=60
    • Whenever you're done reading records the average is simply Total/Counter

    If you think you can deliver a result for a query for AVG without first reading all the records, then think again, because you are horribly wrong.

    Your sarcasm is not amusing. Perhaps I should point out that in the above example: the AVG after reading the first record is 10, the second 15, and after reading all the data 20.

    Just to be clear, there is a big difference between "calculating the AVG after reading all the data", and "calculating the average as you read the data".

    * Do you need to read all the data before you can get the average of all the data - of course!

    * Do you need to read all the data before you start doing any calculations - not at all!

    You can't deliver the first value of AVG until you have done that. Then, why read the records again?

    I will concede that the most practical implementation of the query to DELETE all records less than the average is to precalculate the average, and treat it as a constant input (i.e. spool a single value).

    Which is exactly why I elected to provide a second DELETE example in which calculating the decision to DELETE each row is independent of the same calculation for other rows.

    Sure, you could track changes in AVG as the table changed, if you wanted to and had an awful lot of compute power to spare,

    Not at all, maintaining the average as you delete rows is a very cheap operation when done correctly.

    If you have a Cumulative Total and Record Counter (as in the earlier example), then each time you delete a row: decrease the Counter and subtract the Value from the Cumulative Total. The average at any point in time can be delivered probably more cheaply than the IO ops need to actually perform the delete.

    In conclusion.

    I've found your contributions to this topic to be informative and interesting.

    However, I find your flippancy and stubbornness quite annoying. If you're not going to be open-minded enough to actually put thought into what I have to say, I can't be bothered to waste any more time explaining it to you.

    I do enjoy a good debate and appreciate logical argument, but presenting unfounded rubbish as fact gets up my nose as much as historical inaccuracies seem to get up yours.

    Regards,

    Craig

  • Steve Jones - SSC Editor (11/1/2012)


    Interesting discussion. Freedman does mention inserts and deletes, but he never wrote on the subject. Perhaps he was mistaken, perhaps he hasn't had time. The KB also mentions it, but it could be standard wording and not specific application to inserts/deletes.

    For now, I will stand by Halloween protection being known as applying to updates.

    Steve, just to clarify.

    When you say updates, do you mean the UPDATE statement, or any operation that changes the state of the database? 🙂

  • I have to agree with Craig on the evolution of terminology.

    When a modern American, Englishman, Aussie, et al, says, "decade", he or she most usually does NOT mean "a unit of approximately 10 infantry soldiers and their supporting officers, plus any logistics personnel assigned to them", but most likely means, "a continuous 10 year period, often one starting with a year ending in 1 and ending in a year ending with 0". Same for thousands of Latin and Greek terms that have survived as words but evolved as concepts.

    If someone tells you a football team was "decimated" by the other team, you almost certainly don't go out and buy flowers for the widows of the 1:10 members of the decimated team.

    Even technical/scientific/mathematical terms get this treatment. If a student at college tells you he's taking a "calculus class", do you assume it's something to do with the substance of rocks?

    If your teenage daughter told you she signed up for a gymnastics class, would you immediately call the DA's office to get her teachers prosecuted as pedophiles? (If you know what "gymnasium" originally meant, this will make sense.)

    So, claiming that the original people who coined a word have some sort of exclusive right to define it in eternium and ad extemis, seems like a false argument to me. They might have trademark rights on it, if applicable, but eternal exclusive definition control for common usage?

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • I intended UPDATE statement. I know that an update can be a delete/insert at times under the covers, but since I have not found documentation that links this issue to specific places where an INSERT or DELETE command is run, I am leaving this as the UPDATE statement.

    If you can find me a reference that shows this problem in the other statements, I'll change it.

  • 🙂

    Indeed a nice question, but for me it was "The Beginner's Luck" thing... good it used the only word "modifying..." in the BOL, and just went and selected UPDATE. :hehe:

    (Although I am enjoying the discussion worth all of it... thank you all for their individual contribution.)

    ww; Raghu
    --
    The first and the hardest SQL statement I have wrote- "select * from customers" - and I was happy and felt smart.

  • Nice interesting question, thanks for sharing. Learn something new...

    Thanks

  • craig 81366 (11/1/2012)


    All your transcript link provides is a recollection of the circumstances in which it was discovered.

    There are a number of unanswered possibilities:

    • When the problem was discovered, did the team consider INSERT / DELETE scenarios at all?
    • Even if they hadn't originally considered the possibility of INSERT / DELETE exhibiting similar unexpected behaviour for the same reason, would they consider those to be a different problem?
    • Did the team consider the Halloween problem as exclusively applicable to the UPDATE statement?
    • Or perhaps even more strictly, did they consider it to be the specific problem of updating Salaries over the Salaries index?
    • When they use the word updates, do they mean changing existing records (UPDATE statement), or any update that changes the state of the database?

    When I read what the team published, something over 30 years ago, if I recall it correctly, it was about an update being repeated several times on a row, instead of once. Yes, they looked for the underlying cause, but the problem they addressed was the repeated update. Unfortunately I can't recall the reference - there were rather a lot of papers being published in those days, and it's not easy to remember which journal and with what title everything appeared in. But the System R tyeam published very freely in those days (although they did later see the beginnings of the later "keep it all secret until we have a patent" frenzy) and they published that information quite quickly. You may have seen some earlier comments of mine on this site about the relationship of the IBM library with NRL in the 1960s (I was at the NRL end, while NRL still existed) and that free exchange of information carried on well into the 70s.

    I will however, give you something to consider.

    Languages, words and names are in continuous evolution to aid communication. Believe me, you're not doing yourself any favours if you reject a logical definition on an historical 'point of order' - regardless of the accuracy of your claim.

    It's not a "point of order". It's about what people mean when they use that term. What makes you believe that you youngsters own the meaning, rather than us boring grey-haired old ****s?

    Nevertheless, I'm willing to agree to disagree.

    OK, so am I. We disagree. That means only one of us can be right, and I know which one it is - while, I believe, you don't.

    Up to now, nothing nasty. I'll have to address the rest of your post in a separate message, because I don't want to mix the civilised part of this exchange with the uncilvilised part (and that, incidentally, illustrates one of the differences between us).

    Tom

  • craig 81366 (11/1/2012)


    L' Eomot Inversé (10/31/2012)


    craig 81366 (10/31/2012)


    Then why when I run the following by itself, does the plan not spool?

    SELECT AVG(COL) FROM TABLE_NAME

    Because a displayed MS SQL query plan only shows spool operations when they are not blatantly obvious (which I think is eminently sensible); no other reason.

    Whereas I can agree to disagree on the previous point, here I must step in and call a spade a spade.

    I can only assume your insistence on pursuing this is that you don't fully understand what a spool operation does.

    Not pulling punches: I consider your claim to be utterly unsubstantiated, unfounded rubbish!

    There were "spool operations" way back in ancient times, many years before I got involved in computing (I started computing back 1963 at Oxford with some of Lucius Fox's numerical methods stuff, then some more odd bits in 1964 at STC, worked seriously on it in 1966 at RHEL, played with it while a research student at Bristol U in 1966/67, but only switched from being a mathematician to being a computer scientist some time after I joined NRL at the end of that academic year). What a spool operation did then - and still does now - is read some data from permanent store or working store, execute some transformations on it (the transformations may be null) and place the result or those transformations in some temporary store which is preserved until all subsequent already planned operations requiring the transformed form of that data are completed, at which point that transformed data (usually called "the spool") is discarded. This is a general computing concept dating from the 1950s, long before the existence of any RDBMS. The idea that term "spool operation" has changed its meaning just to cope with RDBMS is complete insanity, since the term is still used every day in connection with passing data to networked destinations (eg to a networked printer). The purposes of spooling are at least threefold: (i) to avoid unwanted repeat computation; (ii) as a direct consequence of (i), to ensure that a single value (the spooled value) of the (derived) data is used for all dependent tasks; and (iii) to allow use to be made of the spool even though the original data may be (temporarily) inaccessible (probably irrelevant today, but enormously important when I first worked in data communications). It is of course irrelevant whether the only use of the spool before destroying it is to format it and output to a console (particularly when that output is asynchronous). If you have a different understanding of "spool", then you are just plain wrong - in fact if you want to disagree with that definition then you are (to use the style of personal abuse that you have chosen to introduce into this dialogue) an ignoramus, and your insistence that your utter misunderstanding of the term is correct while my accurate understanding is incorrect is a very clear indication that you are not just an ignoramus but an arrogant and self-important ignoramus. Of course all of us, not just MS, normally ignore the trivial spool where output is to a single console which just happens to be active when the spool is created and has a reasonably close to synchronous (albeit asynchronous) connection to the task creating the spool.

    It effectively reads all relevant data into temporary storage before proceeding to the next step. This isolates the data from changes that could be effected in said step. It can also be a relatively expensive operation, and so is not nearly as trivial as you seem to think.

    It doesn't read all relevant data and put it into temporary storage. It reads all relevant data and puts some value computed from it (perhaps as little as 1 bit computed from a terabyte of original data) into temporary storage. It matters not one jot (just to correct something else you got wrong elsewhere, and save me the trouble of quoting it) whether it computes the derived data after reading all the original data or computes it on the fly.

    As for being expensive and not as trivial as I think, I should perhaps give you some of my history: in the late 80s I was chief architect of a declarative system R&D project that involved a relational (but definitely NOT SQL) database. One of the concepts being pushed around was that the DML/DDL should be declarative except at transaction boundaries (ie transactions should never see their own afterlooks); I killed that precisely because I understood the cost of spooling. Single statements we could afford (maybe - but if not, we didn't have a viable project); whole multi-statement transactions, no bl***y way! I think that that suggests I understood the cost a few decades ago. Maybe you think I've forgotten?

    The fact is you do not need to spool (first read all data into temporary storage) to calculate an average.

    Ye, you do. It doesn't all have to be there simultaneously, of course. Nobody but you has sugested that for an average one spools the original data, rather than the single datum the average.

    I could carry on by introducing an attack on your method of calculating averages (which any mathematician knows doesn't work unless you have exact numerics with adequate accuracy throughout) but I won't as MS SqQL seems to do it that way. I could be even nastier with your method of recalculating the average as elements are deleted, but I don't see the point in doing so - after all, it is utterly unimaginable that MS is stupid enough to allow the on-the-fly correction as elements are deleted to be done by the method you suggest (if I discovered that it did I'd be recommending to HMG to instruct its suppliers to ban the use of aggregates in SQL Server in any software it commissions except where the ranges and the data types are such that the calculation is provably safe, and I believe that there are enough mathematically qualified civil servants that my recommendation would stick; and it would probably spread to US Gov too; but it won't happen, because MS is not that stupid; in fact it uses what you want to miscall "Halloween Protection" to ensure that that never happens).

    Tom

  • Steve Jones - SSC Editor (11/1/2012)


    Interesting discussion. Freedman does mention inserts and deletes, but he never wrote on the subject. Perhaps he was mistaken, perhaps he hasn't had time. The KB also mentions it, but it could be standard wording and not specific application to inserts/deletes.

    For now, I will stand by Halloween protection being known as applying to updates.

    Did you read http://support.microsoft.com/kb/810026?

  • L' Eomot Inversé (11/5/2012)


    Up to now, nothing nasty. I'll have to address the rest of your post in a separate message, because I don't want to mix the civilised part of this exchange with the uncilvilised part (and that, incidentally, illustrates one of the differences between us).

    There are many differences between us...

    L' Eomot Inversé (11/5/2012)


    you are (...) an ignoramus

    L' Eomot Inversé (11/5/2012)


    you are not just an ignoramus but an arrogant and self-important ignoramus

    Only one of us resorts to insults, name-calling and personal abuse to 'make his point'.

    L' Eomot Inversé (11/5/2012)


    (to use the style of personal abuse that you have chosen to introduce into this dialogue)

    Only one of us uses outright blatant lies to justify his "uncivilised" behaviour.

    L' Eomot Inversé (11/5/2012)


    I could carry on by introducing an attack on your method of calculating averages

    Only one of us specifically avoided providing argument against the content of the other's post and instead chose to direct attacks at the other person.

    By the way, I would be very interested to learn how you intend improving on the mathematically provable minimum number of required addition operations, and a single division operation.

    L' Eomot Inversé (11/5/2012)


    I could be even nastier with your method of recalculating the average as elements are deleted, ... to be done by the method you suggest

    Only one of us deliberately misrepresents the other's statements in order easily attack them. (See http://en.wikipedia.org/wiki/Straw_man)

    To reiterate, the only reason I presented the technique was to illustrate your error in assuming such a calculation would need "an awful lot of compute power to spare".

    Futhermore:

    Only one of us resorted to quoting his education background, experience and work history in an effort to give credibility to his argument.

    Only one of us has made comments suggesting an ageist prejudice.

    Only one of us has made statements insinuating the other is "stupid".

    You're right: Only one of us chose not to "mix the civilised part of this exchange with the uncilvilised part"; but then again, only one of us chose to write an "uncivilised" post at all.

Viewing 15 posts - 31 through 45 (of 54 total)

You must be logged in to reply to this topic. Login to reply