Your Favorite Feature that Needs Work

  • I'm still a bit miffed that they haven't got round to allowing us to include scheduled tasks inside contained databases...

    Thomas Rushton
    blog: https://thelonedba.wordpress.com

  • ThomasRushton - Monday, May 8, 2017 6:29 AM

    I'm still a bit miffed that they haven't got round to allowing us to include scheduled tasks inside contained databases...

    ditto. The whole "containment" effort seems to be lacking resources.

  • TomThomson - Sunday, May 7, 2017 3:00 PM

    I'd like to see SQL Server get rid of nonsense like the sum of an empty set being NULL instead of 0, but of course we would need to get the SQL standard corrected before any supplier could afford to make that change,  and if JC's attitude to it is typical of people on the standards committee those people are either too stupid or too pig-headed (or perhaps both) to get that right.

    Not to be pedantic but the sum of an empty set (mathematically speaking) is Null, not 0.

  • I wish someone would fix the "you can't edit the reply part of your post when the system has decided to make it, for reasons unknown, zero length" bug.   Or is it in Firefox rather than in SQLSWERVERCENTRAL's website?  Or better, fix the "make it zero length" bug.  I spent quite a lot of time trying to edit the empty reply the system created.

    Tom

  • roger.plowman - Monday, May 8, 2017 10:50 AM

    TomThomson - Sunday, May 7, 2017 3:00 PM

    I'd like to see SQL Server get rid of nonsense like the sum of an empty set being NULL instead of 0, but of course we would need to get the SQL standard corrected before any supplier could afford to make that change,  and if JC's attitude to it is typical of people on the standards committee those people are either too stupid or too pig-headed (or perhaps both) to get that right.

    Not to be pedantic but the sum of an empty set (mathematically speaking) is Null, not 0.

    [/qu

    <rant>
    Well, maybe I'm being pedantic but I am prepared to assert that your post is absolute drivel.
    It strikes me as pretty almost unbelievably dumb to say that if you start out with a bag with 10 $1 notes in it contains  $10; and when you remove the note with the highest serial number, you now have only 9 notes in it so it contains only $9, and when you repeat that action 8 times, and the amount in the bag goes down by $1 each time; so there is $8, then $7, then in succession $6, $5, $4m, $3, $2 and $1 in the bag, but then a miracle happens: you remove the last dolla bill and suddenly something very strange happens: insted of going down by $1 again, so that there is $0 (which is what you would expect there to be in the bag when there is nothing in it) there is not nothing in it but some unknown quantity.:  hurrah, you aren't broke, you just dont know how much is in the bag.
    To me, as  highly qualified mathematician (Honours degree in maths from Oxford, research degree in maths from Bristol, Chartered Mathemtician,  Fellow of the Institute of Mathemetics and its Applications, plus a handful of postgraduate qualifications in computing and in engineering) that is pure and utter bullshit, and I regard your assertion that such bullshit is "mathematically speaking" as ignorant nonsense and arrant stupidity of the highest order.  
    You are welcome to join my list of people who make random nonsense assertions about mathematics because they are too damned stupid to learn any mathematics, even simple bloody aritmemetic like this, or to avoid membership by admitting that you were talking nonsense and empty sets actually don't contain anything, they contain nothing,  and if you have nothing, you don't have an unknown quantity.
    </rant>

    Tom

  • TomThomson - Wednesday, May 10, 2017 12:33 PM

    roger.plowman - Monday, May 8, 2017 10:50 AM

    TomThomson - Sunday, May 7, 2017 3:00 PM

    I'd like to see SQL Server get rid of nonsense like the sum of an empty set being NULL instead of 0, but of course we would need to get the SQL standard corrected before any supplier could afford to make that change,  and if JC's attitude to it is typical of people on the standards committee those people are either too stupid or too pig-headed (or perhaps both) to get that right.

    Not to be pedantic but the sum of an empty set (mathematically speaking) is Null, not 0.

    [/qu

    <rant>
    Well, maybe I'm being pedantic but I am prepared to assert that your post is absolute <connotation_of_disagreement>

    </rant>

    The reason the sum of an empty set is null is because of the way we talk about sets. You can always create a table called "bags" with a corresponding column that you can call "number_of_dollars" that you can set to 0 all you want. Maybe whats happening here is that you are confusing "bags" containing "dollars" with the entirely different idea of a stand alone collection of "dollars" with no enclosing "bag."

    Once we do away with the bag, interesting things happen. Say we are grouping by American coins, and we have 2 pennies, 5 nickels, and 3 quarters. When we use the aggregate "sum" over that set of currency, we'll probably group each type of coin together, then count how many of each we have or sum the values. Notice the absence of the bag's containing each type of currency, all we have are the currency.

    Now according to your theory, we MUST return 2 pennies, 5 nickels, 0 dimes and 3 quarters. Of course we can know that there are no dimes but what happens if we didn't? Also, why aren't you asking about the zero sum of half-cents I omitted, maybe I work in a museum and I'm reporting on 1800's currency?

    Its all different if I'm listing BAGS of currency because then I notice the bag labelled "half-cents" and can write 0 as the quantity or value it contains. But I MUST KNOW THE MISSING DENOMINATION to get to your 0 versus null nirvana. We need the bags.

    Really, as computer scientists [1], we should realise that a decision had to be made about nulls, an analysis must be done, and given the above example, I gotta go with Roger's analysis as its trivial to satisfy your use case by left joining to the "bag" table and using the coalesce function against the missing count to get your zero.

    also:

    create table anything
    (
    anycolumn anytype
    )
    select sum(id) from anything
    -- null
    select count(*) from anything
    -- this gives you the zero you were wanting.

    [1] I'm not really a computer scientist

  • patrickmcginnis59 10839 - Monday, May 15, 2017 2:24 PM

    TomThomson - Wednesday, May 10, 2017 12:33 PM

    roger.plowman - Monday, May 8, 2017 10:50 AM

    TomThomson - Sunday, May 7, 2017 3:00 PM

    I'd like to see SQL Server get rid of nonsense like the sum of an empty set being NULL instead of 0, but of course we would need to get the SQL standard corrected before any supplier could afford to make that change,  and if JC's attitude to it is typical of people on the standards committee those people are either too stupid or too pig-headed (or perhaps both) to get that right.

    Not to be pedantic but the sum of an empty set (mathematically speaking) is Null, not 0.

    [/qu

    <rant>
    Well, maybe I'm being pedantic but I am prepared to assert that your post is absolute <connotation_of_disagreement>

    </rant>

    The reason the sum of an empty set is null is because of the way we talk about sets. You can always create a table called "bags" with a corresponding column that you can call "number_of_dollars" that you can set to 0 all you want. Maybe whats happening here is that you are confusing "bags" containing "dollars" with the entirely different idea of a stand alone collection of "dollars" with no enclosing "bag."

    Once we do away with the bag, interesting things happen. Say we are grouping by American coins, and we have 2 pennies, 5 nickels, and 3 quarters. When we use the aggregate "sum" over that set of currency, we'll probably group each type of coin together, then count how many of each we have or sum the values. Notice the absence of the bag's containing each type of currency, all we have are the currency.

    Now according to your theory, we MUST return 2 pennies, 5 nickels, 0 dimes and 3 quarters. Of course we can know that there are no dimes but what happens if we didn't? Also, why aren't you asking about the zero sum of half-cents I omitted, maybe I work in a museum and I'm reporting on 1800's currency?

    Its all different if I'm listing BAGS of currency because then I notice the bag labelled "half-cents" and can write 0 as the quantity or value it contains. But I MUST KNOW THE MISSING DENOMINATION to get to your 0 versus null nirvana. We need the bags.

    Really, as computer scientists [1], we should realise that a decision had to be made about nulls, an analysis must be done, and given the above example, I gotta go with Roger's analysis as its trivial to satisfy your use case by left joining to the "bag" table and using the coalesce function against the missing count to get your zero.

    also:

    create table anything
    (
    anycolumn anytype
    )
    select sum(id) from anything
    -- null
    select count(*) from anything
    -- this gives you the zero you were wanting.

    [1] I'm not really a computer scientist

    So, you have this set which doesn't contain anything at all.  Then you want the sum of the values of the things it contains.  And you claim that isn't zero?  Really?

    Your pseudo-sql is amusing; select sum(id) from anything doesn't deliver NULL, it delivers an error since there's no column called "id".    If you have a column called "id" and "id" had an arithmetic type, it represents the sum of the id columns of all  the rows where that column is not null (because SUM is specifically defined in SQL to take into account only rows where the quantity to be summed is not null).  If there are no such rows, there is nothing to be summed, so the answer in that case should be zero.

    The bags are irrelevant; if we start from a collection of things which have value, we can sum the individual values without first splitting the collection up into bags each of which contains only things of one particular value.  Doing that split is usually a pointless (unless its purpose is to reduce performance) excercise bt it is potentially useful when one wants to force an internal operation order within the SUM operation that may help reduce rounding errors or reduce the risk of overflow based on some prior knowledge of the patterns of the data (SQL currently provides no means of forcing order of computation for its sum, average, variance, and standard deviation operators, if it did provide such a method bagging would always be pointless unless actually one wanted separate counts or sums for each distinct value as well as the totals,  but even then it's hard to see any reason for re-inventing the wheel instead of just using GROUP BY).

    Tom

  • TomThomson - Tuesday, May 16, 2017 10:12 AM

    patrickmcginnis59 10839 - Monday, May 15, 2017 2:24 PM

    TomThomson - Wednesday, May 10, 2017 12:33 PM

    roger.plowman - Monday, May 8, 2017 10:50 AM

    TomThomson - Sunday, May 7, 2017 3:00 PM

    I'd like to see SQL Server get rid of nonsense like the sum of an empty set being NULL instead of 0, but of course we would need to get the SQL standard corrected before any supplier could afford to make that change,  and if JC's attitude to it is typical of people on the standards committee those people are either too stupid or too pig-headed (or perhaps both) to get that right.

    Not to be pedantic but the sum of an empty set (mathematically speaking) is Null, not 0.

    [/qu

    <rant>
    Well, maybe I'm being pedantic but I am prepared to assert that your post is absolute <connotation_of_disagreement>

    </rant>

    The reason the sum of an empty set is null is because of the way we talk about sets. You can always create a table called "bags" with a corresponding column that you can call "number_of_dollars" that you can set to 0 all you want. Maybe whats happening here is that you are confusing "bags" containing "dollars" with the entirely different idea of a stand alone collection of "dollars" with no enclosing "bag."

    Once we do away with the bag, interesting things happen. Say we are grouping by American coins, and we have 2 pennies, 5 nickels, and 3 quarters. When we use the aggregate "sum" over that set of currency, we'll probably group each type of coin together, then count how many of each we have or sum the values. Notice the absence of the bag's containing each type of currency, all we have are the currency.

    Now according to your theory, we MUST return 2 pennies, 5 nickels, 0 dimes and 3 quarters. Of course we can know that there are no dimes but what happens if we didn't? Also, why aren't you asking about the zero sum of half-cents I omitted, maybe I work in a museum and I'm reporting on 1800's currency?

    Its all different if I'm listing BAGS of currency because then I notice the bag labelled "half-cents" and can write 0 as the quantity or value it contains. But I MUST KNOW THE MISSING DENOMINATION to get to your 0 versus null nirvana. We need the bags.

    Really, as computer scientists [1], we should realise that a decision had to be made about nulls, an analysis must be done, and given the above example, I gotta go with Roger's analysis as its trivial to satisfy your use case by left joining to the "bag" table and using the coalesce function against the missing count to get your zero.

    also:

    create table anything
    (
    anycolumn anytype
    )
    select sum(id) from anything
    -- null
    select count(*) from anything
    -- this gives you the zero you were wanting.

    [1] I'm not really a computer scientist

    So, you have this set which doesn't contain anything at all.  Then you want the sum of the values of the things it contains.  And you claim that isn't zero?  Really?

    Your pseudo-sql is amusing; select sum(id) from anything doesn't deliver NULL, it delivers an error since there's no column called "id".    If you have a column called "id" and "id" had an arithmetic type, it represents the sum of the id columns of all  the rows where that column is not null (because SUM is specifically defined in SQL to take into account only rows where the quantity to be summed is not null).  If there are no such rows, there is nothing to be summed, so the answer in that case should be zero.

    The bags are irrelevant; if we start from a collection of things which have value, we can sum the individual values without first splitting the collection up into bags each of which contains only things of one particular value.  Doing that split is usually a pointless (unless its purpose is to reduce performance) excercise bt it is potentially useful when one wants to force an internal operation order within the SUM operation that may help reduce rounding errors or reduce the risk of overflow based on some prior knowledge of the patterns of the data (SQL currently provides no means of forcing order of computation for its sum, average, variance, and standard deviation operators, if it did provide such a method bagging would always be pointless unless actually one wanted separate counts or sums for each distinct value as well as the totals,  but even then it's hard to see any reason for re-inventing the wheel instead of just using GROUP BY).

    An empty set (mathematical set) cannot be summed, since a set can contain literally anything the sum of an empty set cannot be known without knowing what it is a set of. In Patrick's coin example you can't provide 0 for types of coins not in the set because you literally don't know which coins are missing. In an empty set of coins you cannot provide a sum at all, since you can't know (from the set) that it is in fact a set of coins and not, say, a set of aardvarks--who don't have a sum either. 🙂

    Since you can't know what the sum of an empty set might be, it is literally unknown--represented by null.

    Should you argue you do know which coins you want counts of, you would have to supply that set in a seperate table, left-joined to the results of your summed set, using COALLESCE just as Patrick suggested. But in that case you are supplying information external to the summed set itself, which means the summed set still can't give you 0 for missing coins. Only the joined/coallesced set can.

  • Your pseudo-sql is amusing; select sum(id) from anything doesn't deliver NULL, it delivers an error since there's no column called "id".

    Yes that was an error.

    If there are no such rows, there is nothing to be summed, so the answer in that case should be zero.

    I found another explanation that I found appealing, sum is an aggregate that works on the column being summed using compatible operators, just like avg, min, max, etc... and given that avg of an empty set makes sense to be null, as well as min, max, then its not that big a leap for sum to be as well. Count on the other hand, works on rows so in this case zero rows makes sense although once we go back to grouping, its best to just drop the group with no rows like I was talking about in my previous post.

    I'm sure its perfectly possible to return zero in the cases you've described, but given how the other aggregates work, would it be expected? Would it seem consistent?

  • Somehow, the explanations defending the sum of an empty set as null don't make as much sense as the arithmetic explanation from Tom.
    SUM is essentially a mathematical function, the same way that COUNT. Why would 2 mathematical functions have different behaviors when dealing with empty sets? Zero is a number which quantifies a count or an amount of null size. There's also the concept of Empty sum which by convention is the additive identity: zero.
    There's no such thing as empty average or standard deviation or any of the others statistical aggregate functions. MIN and MAX are not mathematical. In those cases, the result should be null as there isn't an acceptable value for them.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • roger.plowman - Tuesday, May 16, 2017 10:32 AM

    An empty set (mathematical set) cannot be summed, since a set can contain literally anything the sum of an empty set cannot be known without knowing what it is a set of. In Patrick's coin example you can't provide 0 for types of coins not in the set because you literally don't know which coins are missing. In an empty set of coins you cannot provide a sum at all, since you can't know (from the set) that it is in fact a set of coins and not, say, a set of aardvarks--who don't have a sum either. 🙂

    Since you can't know what the sum of an empty set might be, it is literally unknown--represented by null.

    Should you argue you do know which coins you want counts of, you would have to supply that set in a seperate table, left-joined to the results of your summed set, using COALLESCE just as Patrick suggested. But in that case you are supplying information external to the summed set itself, which means the summed set still can't give you 0 for missing coins. Only the joined/coallesced set can.

    Oddly enough sets and bags can have this strange property known as type.  I might have a set of integers, and a set of real numbers.  When theses sets are empty, they both have the same content (none); but they are not the same set, because their types are different.  The fact that bags and sets are typed (in T-SQL and in mathematics) means that I can know what the sum of an empty set is (and I will get an error If I attempt to sum a collection of things whose type is not summable).
    The things we mostly deal with in SQL are "relations" (which should be strongly typed sets, but can be bags instead of sets if the data design is not in at least 1NF) and "derived relations" (which are strongly typed bags).  At the atomic level, we have values (typed as members of domains) and NULLs (typed as if they were members of domains).  Functions tend to be somewhat typed, but aren't necessarily functions in the mathematical sense of "function"; stored procedures are hardly typed at all (they do have typed arguments).  But when we come to "SUM" in SQL, everything relevant to it is strongly typed.

    As I already said, prebagging into bags of similar objects all with the same value and then getting a bag value for each bag by counting and multiplying that one value by the count and then getting the final sum by adding all the bag values is a waste of processing power, just adding is faster on most hardware.  Sometimes it's only a small waste of power, but in other cases it can require a sort of all the data to do the prebagging.  The cost of counting is the cost of adding 1 as often as needed, so you are not effectively reducing the number of additions in the cost, you then have the cost of multiplications and the cost of the final additions after the mulitplications which look like extra costs.  Messing about with unneccessary joins and pointless coalescing is usually a sign of something wrong - perhaps bad data design, perhaps a preference for complex code over simple code, perhaps both. 

    Why on earth would I want to provide values for missing coins (which I take to mean coins not in the collection whose values I am summing)?  They can have no effect on the sum of the coins that are not missing.  

    If you look at the syntax of the SQL SUM aggregate function, you will see that it requires an argument which is an expression which delivers a value of arithmetic type from each row in some relation or derived relation; that's its only argument.   So what is being summed isn't a set of coins or anything like that, it's the collection (mathematically not a set but a bag, since it can contain duplicates) of values delivered by that expression when applied to the rows presented to it.  The SUM function can't be summing anything other than those values of arithmetic type, because it doesn't have any arguments indicating anything other than those values.   And of course both in SQL and in mathematics the only bags that have sums are bags of things to which a binary sum operation applies (mostly we think in terms of a reduce functional operator so that "sum" is just a name for reduce(+,0), so in SQL we can only talk about sums of summable things, not sums of anything else.

    The only thing ever summed by the SQL SUM aggregate function is the bag of arithmetic values delivered by its argument (excluding, by the definition in SQL of that aggregate function, any NULL values that the argument delivered).  So the SUM aggregate function is a pure mathematical sum, not anything invovling coins or gold bars or whatever.

    Tom

  • patrickmcginnis59 10839 - Tuesday, May 16, 2017 11:35 AM

    I found another explanation that I found appealing, sum is an aggregate that works on the column being summed using compatible operators, just like avg, min, max, etc... and given that avg of an empty set makes sense to be null, as well as min, max, then its not that big a leap for sum to be as well. Count on the other hand, works on rows so in this case zero rows makes sense although once we go back to grouping, its best to just drop the group with no rows like I was talking about in my previous post.

    I'm sure its perfectly possible to return zero in the cases you've described, but given how the other aggregates work, would it be expected? Would it seem consistent?

    That is certainly a better argument than the earlier ones, but I don't think it's right. 

    In maths, the sum of an empty collection is either a type violation (when the collection is typed to have values that are not possible arguments for addition) or 0 (all other cases).  The min and max of an empty collection are bottom ( or maybe top, but that would also be NULL as SQL has only one NULL so "unknown" and "impossible" both use it) which is the same as SQL (except SQL calls it NULL instead of bottom or top) because if it doesn't contain anything it can't contain a largest value or a smallest value. Its average is either an error (the collection is typed as having unsummable content) or  bottom (because both the count and the sum are 0, so the average could be any number at all as 0*x = 0 for all x, so we have no information at all apart from its type about x) so that too is the same as SQL.
    There are solid mathematical(logical) arguments for MIN, MAX, and AVG to return NULL, and equally solid mathematical(logical) aruments for SUM not to return NULL, and mathematicians have lived with that for a century or more without noticin any consistency problem.  If the people who decided to make the SUM of the empty set NULL way back in the 1970s when Codd and Date were both fighting to get IBM to make their new database based on Codd's work a proper relational one and losing that battle, the database community would probably have noticed nothing at all odd about it, and it is certainly consistent so it probably would seem consistent to most people.  (Thought: I wonder if this empty means unlknown sum nonsense was what set CJD so strongly anti-NULL - if the language didn't have NULL it couldn't misuse NULL, could it?)

    Tom

Viewing 12 posts - 46 through 56 (of 56 total)

You must be logged in to reply to this topic. Login to reply