GROUP BY vs DISTINCT -- SQL SERVER BUG ?

  • Karsten's explanation sounds plausible and MOSTLY fits the situation.  Thanks.

    However, when I use a temporary table, the values are computed properly.  Maybe that failure only appears in subqueries combined with GROUP BY.

    In any case, I can't believe that SQL just ignores the second function call and repeats the value under any circumstance.   Using arguments to control function output is what functions are all about!  If I had to write a different function for each distinct set of arguments, I'd go crazy.

    Anyway, is the scenario documented by Microsoft?  I like to show my management a public web page or similar when explaining why I'm behind a day or two chasing a bug.

  • Why should SQL Server care if I duplicate a column in the ORDER BY clause anyway?

    Anyway, when aliases are used the error goes away, but the column value is duplicated anyway.

    In my opinion there are some bugs here.

  • maybe expressing the functions with an explicit alias (AS X, AS Y) might remove the problem?

    as for count(0)

    i don't know - never used it before , i always find a nice bit field or integer to do it on, you are right though... count(*) is bad practice

    MVDBA

  • Well... 

    I tried using AS and that didn't change a thing.  Now that a few others have looked at this closely and don't see anything that I missed, I think that there is a bug and I need to find out how to tell Microsoft without incurring a charge.

    Re: COUNT(0)

    I re-checked my assumptions about COUNT(0).  I now feel that I was wrong about it and will now resume using COUNT(*) for both conceptual and actual reasons, at least on Oracle.  I don't know if SQL treats them differently, but I'll use COUNT(*) there too. 

    Based on a trusted Oracle expert's experience, the two forms perform identically in the latest releases and COUNT(*) is explicitly about counting rows, rather than non-null values of some expression (constant 0, a column, etc.).

  • Hi Peter,

    I think you are correct about sense and use of functions. I suggested to use different functions just to check this out or give a suitable workaround but I agree with you this can't be it.

    So what is wrong in doing this query in two steps using a temporary table? You have shown already that this works. It is just no use to put function calls in group by clauses even without subqueries; at least this is my experience.

    I never read something official about this behaviour but I can imagine that MS will agree it is not as it should work, not really. They will not claim "it's not a bug it's a feature". It is not a feature it is at least something like a bug I think.

    Maybe you can get a statement of MS when you relate this to MS support center and use this statement to relate to your folks?

    Good luck

    Karsten

     

  • This is a SQL Server bug that was supposed to have been fixed.

    Through an MSDN news group posting, Steve Kass from Drew University pointed out the KB article and confirmed that the bug has not been fixed. See kb article 288957.  It is a very slightly different scenario.

    You can see the microsoft newsgroup postings by searching the MSDN Newsgroups group MS Enterprise Development  for "sql bug kryszak".   I put full DDL, etc. up there.

    Thanks to everyone for the feedback.

  • There is another outstanding SQL Server problem report that I just came across that is very similar and probably explains this.

    The published problem doesn't list subqueries as affected, but I'm sure that the query parser just un-nests the query in some way.

    See kb article 323504.

  • I recently found out that Microsoft has added a new KB article for this issue:

    http://support.microsoft.com/?kbid=883415

    The good news is that there is a hotfix, the bad news is that it isn't publicly available. To get it, you have to open a paid support incident with Microsoft and if they find out that the hotfix is the solution in your case, you will get your money back.

    Razvan

  • Razvan,

    I really appreciate your bringing this to me attention. Thanks.

    In one other case, I have found that if I place a call and explain it in a reasonable way, including references to the KB article and explaining the steps that I've taken to confirm it, that they don't even begin to charge. I am hopeful that the same will occur on this issue.

    It's funny that the latest date on the DLLs is July 2004. Their QA seems to be slow. I hope that it is careful too.

    Also, what is most striking is that in several cases it appears that the SQL parser _assumes_ that each function call is exactly the same if it matches in the first few (10?) characters and doesn't bother to evaluate the other references. That's a fundamental flaw that indicates poor underpinnings for this database.

  • I am probably way off base, but might the problem be that the UDF OUTPUT is what is causing SQL Server to recognize these as the same column?  The output from a UDF is the same even if the input to the UDF is different.   

    (Sorry, I did not see you had already solved this...)

    I wasn't born stupid - I had to study.

  • I don't understand what you mean by "UDF OUTPUT" here, but I'm interested in your idea.

    Function calls return different data depending on the input parameters. SQL doesn't recognize that the input parameter lists are different if the first few parameters are the same in each call.

    SQL does make assumptions about UDFs always returning the same output when the parameters are the same (deterministic) and that imposes a lot of restrictions on UDFs but it allows SQL to skip evaluating the same UDF over and over again. I was wondering if the restrictions are there because the SQL processing is simplistic, and if SQL Server didn't restrict UDFs to be deterministic and always evaluated them regardless of the parameters, then this wouldn't have been a bug.

  • I think you and I are the same wave length.  UDFs do not always have to be deterministic, but SQL makes that assumption so it does not have to evaluate them over and over, as you stated.  This may be a good 'general' assumption as it would take time to run these evaluations.  Unfortunately, this looks like a case in which that assumption is goofing up the overall select. 

    I can GROUP BY the same column multiple times without error; even though it does not benefit anything in my selection. 

    I am wondering if the use of the UDF is causing a similar circumstance in your "Very first attempt".  It looks to me like that is the only time the column is not specified directly.  Instead, it is the output from an UDF.  But that only deals with your "Very first attempt", not the "Group By" which does directly specify the columns [X.COUNCIL and X.OLDCOUNCIL).  Unless, SQL is not accepting the alias names and when it backtracks to find the actual columns to Group upon, it hits the UDF and makes the assumption that it is always returning the same output. 

    I wasn't born stupid - I had to study.

  • There are three knowledge base articles that deal with this assumption that the SQL processor makes. One of these indicates that the problem is an unfixed bug but maybe the latest hot fix will correct it.

    I've been busy lately to re-review these with respect to my particular example, but here they are for your reference. If you find other similar topics let me know.

    See

    http://support.microsoft.com/Default.aspx?kbid=883415

    http://support.microsoft.com/default.aspx?kbid=288957

    http://support.microsoft.com/default.aspx?kbid=323504

  • Also, it is never a good idea to make a design decision like this (allow only deterministic functions and assume that all similar invocations are the same) purely for performance.

    I had to substantially redesign when I ported some Oracle procedures to account for this.

    I consider the poor parsing a first class bug and the limitation to deterministic functions in a column list an unnecessary dumbing down, if not a major fault of SQL Server.

Viewing 14 posts - 16 through 28 (of 28 total)

You must be logged in to reply to this topic. Login to reply