The T-SQL Paradigm

Question

Post reply

The T-SQL Paradigm

Viewing 15 posts - 91 through 105 (of 266 total)

You must be logged in to reply to this topic. Login to reply

Aaron N. Cutshall SSCrazy Eights Points: 8904 More actions · Answer 1

David Walker (4/3/2009)

Although this is off the original topic, I still haven't seen a SQL statement that was easier to read with the aliases than without them. Please show me some, if you have any.

I don't have any short examples to give you because that rather defeats the purpose. Some of our queries, if printed, are several pages long due to extensive use of CTEs, inner joins and conditional clauses. They are quite difficult enough to follow as it is without adding large table names repeatedly through the statement.

The problem we found with using full table names in our queries is that it quickly becomes full of table names that, because of the number of fields being referenced, get repeated throughout the query and make it difficult to follow the logic because the eye tends to focus on the table names due to their repetition and not on the field names. By using a known set of aliases, we've reduce the problem dramatically. It becomes one of those situations where it's difficult to see the forest due to the trees.

I don't mean to impinge upon David's experience and from a certain point of view I can understand his position. As a rule of thumb, it is a good idea to use as full of a declaration as possible for unambiguity. Unfortunately, as most everyone can attest to, the development world is not black and white and not all rules can be applied in all situations. All we can do is to balance all of these good ideas with each situation and make the best of it that we can. At the moment, I can't think of any "must be followed" rules that should be used in all situations regardless of circumstances. The true measure of a developer is in the application of good judgment in each given situation. That, more than anything else, will determine the success or failure of a project.

LinkedIn: http://www.linkedin.com/in/sqlrv
Twitter: http://www.twitter.com/sqlrv
Website: http://sqlrv.com

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 2

Manie Verster (4/6/2009)

To get SQL to perform is what might make it look difficult. It's easy to write a query but to get the optimal performance out of that query is what is making it difficult.

Heh... you've just gotta know that I strongly disagree with that, Manie. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 3

Aaron N. Cutshall (4/6/2009)

David Walker (4/3/2009)

Although this is off the original topic, I still haven't seen a SQL statement that was easier to read with the aliases than without them. Please show me some, if you have any.

I don't have any short examples to give you because that rather defeats the purpose. Some of our queries, if printed, are several pages long due to extensive use of CTEs, inner joins and conditional clauses. They are quite difficult enough to follow as it is without adding large table names repeatedly through the statement.

The problem we found with using full table names in our queries is that it quickly becomes full of table names that, because of the number of fields being referenced, get repeated throughout the query and make it difficult to follow the logic because the eye tends to focus on the table names due to their repetition and not on the field names. By using a known set of aliases, we've reduce the problem dramatically. It becomes one of those situations where it's difficult to see the forest due to the trees.

I don't mean to impinge upon David's experience and from a certain point of view I can understand his position. As a rule of thumb, it is a good idea to use as full of a declaration as possible for unambiguity. Unfortunately, as most everyone can attest to, the development world is not black and white and not all rules can be applied in all situations. All we can do is to balance all of these good ideas with each situation and make the best of it that we can. At the moment, I can't think of any "must be followed" rules that should be used in all situations regardless of circumstances. The true measure of a developer is in the application of good judgment in each given situation. That, more than anything else, will determine the success or failure of a project.

Spot on. I normally use table aliases to add simple clarity to the code. For example and to make a much longer story shorter, we had a rather hairy query that was supposed to return a report of failures but only if the failure hadn't been followed by a success. The failures could happen many times as the code being logged could do up to 10 retries based on whether or not a particular download failed or succeeded.

Obviously the log table had to be queried not once, but twice. Once for failures and once for success and then joined for exclusion. The ProcLog table was given two aliases... "fail" and "pass". It made life a whole lot easier to follow. And, obviously, you couldn't use the full table name for the two instances because it would have given errors in ambiguity.

But, like you said, to each their own.

As for "rules that must be followed", I can think of a few. For example, in a joined update, the object of the update must be included in the FROM clause or bad things can sometimes happen depending on a bazillion different factors such as scale, indexes available (or not), and whether or not parallelism is spawned. We had such an UPDATE slam 4 CPU's into the wall for over 2 hours. When we made the correction, the code ran in about 20 seconds.

Of course, if you end up in my shop, the "rules that must be followed" increase quite a bit. 😉 The first rule that many developers seem to omit is that the code actually has to work. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Jason Miller-476791 SSCommitted Points: 1510 More actions · Answer 4

GSquared (4/3/2009)

For example, why doesn't this work:

select Col1, count(*)
from dbo.MyTable;

And the follow-up where,

select Col1, Col2, count(*)



from dbo.MyTable

GROUP BY Col1, Col2;

select Col1, Col2, count(*)



from dbo.MyTable

GROUP BY Col2, Col1;

Are not equal (in the sense they don't return equal sets).

Perhaps I'm missing some small facet, but, speaking strictly from a mathematical standpoint, shouldn't they results be identical (provided you sort them)?

I mean if you take all of the items of group A and then subdivide them by all of the items from group B, shouldn't that be the same end result as taking all the items in group B and then subdividing them by the items in group A?

Again, I'm not specifically pointing to a MSSQL implementation, just a generic math issue. I touched upon set theory in college, but my focus was numerical analysis.

I recall the saying "Better to keep your mouth shut and look like a fool, than to open it and prove it."

But I also remember a bit of advice from one of the true geniuses I've ever met: "Ask the question. Even if you think the question or answer is foolish. Better to be right...":unsure:

Honor Super Omnia-
Jason Miller

GSquared SSC Guru Points: 260824 More actions · Answer 5

There are plenty of rules that "must be followed" every time. Most of them are the ones built into the language.

There are a few that I follow with tremendous consistency, even if not necessary, in order to make the code as comprehensible as possible.

On the subject of aliases, I use them, but I seriously dislike the 1-letter versions. They're required by shop standards at the place I currently work, and I really, really dislike that. I prefer to either use the whole table name, or an alias that's easy to read and identify at a glance.

The idea that "you can just look at the From clause" is only valid if the query is small enough to fit on the screen, and even then, it's disruptive to have to constantly be re-reading that instead of just reading the darn code.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Aaron N. Cutshall SSCrazy Eights Points: 8904 More actions · Answer 6

Jeff Moden (4/6/2009) The first rule that many developers seem to omit is that the code actually has to work. 😉

OK, you got me there!:-P However, I was referring to the "rules of thumb" which are beyond what is dictated by code execution. There are many such "rules of thumb" that are a really good idea and should be implemented where appropriate, but good judgment and experience should dictate what is appropriate. For anyone to state that a given "rule of thumb" should be applied in all situations as a hard-and-fast rule is following it blindly and not taking individual situations into consideration. While it may apply in 99.99% of situations, there's always a few times where it should not apply.

LinkedIn: http://www.linkedin.com/in/sqlrv
Twitter: http://www.twitter.com/sqlrv
Website: http://sqlrv.com

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 7

Aaron N. Cutshall (4/6/2009)

Jeff Moden (4/6/2009) The first rule that many developers seem to omit is that the code actually has to work. 😉

OK, you got me there!:-P However, I was referring to the "rules of thumb" which are beyond what is dictated by code execution. There are many such "rules of thumb" that are a really good idea and should be implemented where appropriate, but good judgment and experience should dictate what is appropriate. For anyone to state that a given "rule of thumb" should be applied in all situations as a hard-and-fast rule is following it blindly and not taking individual situations into consideration. While it may apply in 99.99% of situations, there's always a few times where it should not apply.

Heh... no, I get it. But there are still rules of thumb that absolutely must be followed. The example I gave about the UPDATE is one such rule. On most of the other stuff though, I absolutely agree. Although very rare, I've seen it where a well written While Loop will, in fact, beat any form of set based code including (hate to admit this) some applications of the Tally table (I've only seen one, so far). The bad part is that the rule of thumb to avoid things like While Loops and other forms of RBAR is subject to the ability of or sometimes just the attitude of the developer. The "good enough" attitude in the face of schedule pressures is the frequent cause of unneccessary rule of thumb violations. My answer to such violations is "prove to me that it cannot be done" using the rule of thumb methods. With the single exception I mentioned, and that wasn't at work, the rules of thumb I've invoked in my shop have served very well 100% of the time.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Aaron N. Cutshall SSCrazy Eights Points: 8904 More actions · Answer 8

Jeff Moden (4/6/2009)

The "good enough" attitude in the face of schedule pressures is the frequent cause of unneccessary rule of thumb violations. My answer to such violations is "prove to me that it cannot be done" using the rule of thumb methods. With the single exception I mentioned, and that wasn't at work, the rules of thumb I've invoked in my shop have served very well 100% of the time.

Obviously, the code MUST work. Most of the "rules of thumb" that I have seen have been aesthetic or to improve performance and there is merit in each and every one under the majority of circumstances. Some are personal preferences that don't affect the end result. I'm glad that you've been fortunate in your experiences. I have, on occasion, been forced to forgo some of my preferred "rules of thumb" either in favor of locally established conventions or the circumstances dictated another approach (such as using the dreaded CURSOR!).

While we must all keep good practices in mind and apply them as much as possible, we must also be flexible and use good judgment. That is what makes good developers great.

LinkedIn: http://www.linkedin.com/in/sqlrv
Twitter: http://www.twitter.com/sqlrv
Website: http://sqlrv.com

davidandrews13 SSCertifiable Points: 6689 More actions · Answer 9

i code in VB.net and i'm relatively new to using SQL, only doing it for about 1 and a half years, so i have to admit that i dont have anything to compare it to.

having said that i dont find anything particularly difficult about writing SQL. in terms of its syntax or trying to work out what i want to do. there's nothing that i find unintuitive about it.

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 10

davidandrews13 (4/6/2009)

i code in VB.net and i'm relatively new to using SQL, only doing it for about 1 and a half years, so i have to admit that i dont have anything to compare it to.

having said that i dont find anything particularly difficult about writing SQL. in terms of its syntax or trying to work out what i want to do. there's nothing that i find unintuitive about it.

I couldn't have said it better. I wish there were more like you.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Steve Jones - SSC Editor SSC Guru Points: 728159 More actions · Answer 11

davidandrews13 (4/6/2009)

there's nothing that i find unintuitive about it.

I think you're ready to be a DBA some day! It's unintuitive for many people, but I tend to agree with you. From my first exposures using PL/SQL from Foxpro to access an Oracle db, the set based solutions just made sense to me.

Follow me on Twitter: http://www.twitter.com/way0utwest
Forum Etiquette: How to post data/code on a forum to get the best help
My Blog: www.voiceofthedba.com

Steve Jones - SSC Editor SSC Guru Points: 728159 More actions · Answer 12

Jeff Moden (4/6/2009)

Heh... no, I get it. But there are still rules of thumb that absolutely must be followed.

There are, but unfortunately we don't have enough good examples, and people spreading the word. That's why we need the Mr Moden's RBAR Avoidance Manual written!

Follow me on Twitter: http://www.twitter.com/way0utwest
Forum Etiquette: How to post data/code on a forum to get the best help
My Blog: www.voiceofthedba.com

YSLGuru SSC-Insane Points: 21569 More actions · Answer 13

Jason Miller (4/6/2009)

GSquared (4/3/2009)

For example, why doesn't this work:

select Col1, count(*)
from dbo.MyTable;

And the follow-up where,

select Col1, Col2, count(*)
from dbo.MyTable
GROUP BY Col1, Col2;

select Col1, Col2, count(*)
from dbo.MyTable
GROUP BY Col2, Col1;

Are not equal (in the sense they don't return equal sets).

Perhaps I'm missing some small facet, but, speaking strictly from a mathematical standpoint, shouldn't they results be identical (provided you sort them)?

I mean if you take all of the items of group A and then subdivide them by all of the items from group B, shouldn't that be the same end result as taking all the items in group B and then subdividing them by the items in group A?

Again, I'm not specifically pointing to a MSSQL implementation, just a generic math issue. I touched upon set theory in college, but my focus was numerical analysis.

....

Actually the 2 Queries may not prodcue the same results. If one of the 2 columns is a subset or superset of the other then chanaging the order of the Group By will change the both the number of rows and the values listed in each row even though the total for all rows in each column (with an aggregate) will be the same for both queries. For example in an apartment complex you may have several types of units. If you wanted a count of these and you grouped by Unit Type then Unit you would get a different row level result then if you grouped by Unit then Unit type. This is a simple and obvious example but not every case where the order of the grouping will change the results is as easy to see as this example.

Kindest Regards,

Just say No to Facebook!

Jason Miller-476791 SSCommitted Points: 1510 More actions · Answer 14

YSLGuru (4/6/2009)

For example in an apartment complex you may have several types of units. If you wanted a count of these and you grouped by Unit Type then Unit you would get a different row level result then if you grouped by Unit then Unit type. This is a simple and obvious example but not every case where the order of the grouping will change the results is as easy to see as this example.

So if we have a complex with Basic, standard, and Deluxe apartments, with a smattering of 1, 2, and 3 bedroom units in each level...

You're proposing that if I group them into "Class" and then group that into #bdrms, I get one set of rows, and alternatively, if I take the same complex, and divide the apartments into a groups of 1,2,3 bdrms, and then group by class of apartment, I'll get different rows/counts?

I don't see that.

Honor Super Omnia-
Jason Miller

Michael Valentine Jones SSC Guru Points: 64818 More actions · Answer 15

YSLGuru (4/6/2009)

Jason Miller (4/6/2009)

GSquared (4/3/2009)

For example, why doesn't this work:

select Col1, count(*)
from dbo.MyTable;

And the follow-up where,

select Col1, Col2, count(*)
from dbo.MyTable
GROUP BY Col1, Col2;

select Col1, Col2, count(*)
from dbo.MyTable
GROUP BY Col2, Col1;

Are not equal (in the sense they don't return equal sets).

Perhaps I'm missing some small facet, but, speaking strictly from a mathematical standpoint, shouldn't they results be identical (provided you sort them)?

I mean if you take all of the items of group A and then subdivide them by all of the items from group B, shouldn't that be the same end result as taking all the items in group B and then subdividing them by the items in group A?

Again, I'm not specifically pointing to a MSSQL implementation, just a generic math issue. I touched upon set theory in college, but my focus was numerical analysis.

....

Actually the 2 Queries may not prodcue the same results. If one of the 2 columns is a subset or superset of the other then chanaging the order of the Group By will change the both the number of rows and the values listed in each row even though the total for all rows in each column (with an aggregate) will be the same for both queries. For example in an apartment complex you may have several types of units. If you wanted a count of these and you grouped by Unit Type then Unit you would get a different row level result then if you grouped by Unit then Unit type. This is a simple and obvious example but not every case where the order of the grouping will change the results is as easy to see as this example.

I don't see how the example you gave would give you different rows.

Other than the order of the result set, and assuming you are not using CUBE, ROLLUP, or outer joins, can you provide an example with data where they would be different?



print 'Load random integers'

select

	a=abs(checksum(newid()))%3, b=abs(checksum(newid()))%2

into

	#x1

from

	(select top 100 * from syscolumns)a

	cross join

	(select top 100 * from syscolumns) b

print 'Group by a,b'

select

	a,b,count(*)

from

	#x1

group by

	a,b

order by

	a,b

print 'Group by b,a'

select

	a,b,count(*)

from

	#x1

group by

	b,a

order by

	a,b

drop table #x1

Results:



Load random integers

(10000 row(s) affected)

Group by a,b

a           b

----------- ----------- -----------

0           0           1691

0           1           1624

1           0           1605

1           1           1615

2           0           1686

2           1           1779

(6 row(s) affected)

Group by b,a

a           b

----------- ----------- -----------

0           0           1691

0           1           1624

1           0           1605

1           1           1615

2           0           1686

2           1           1779

(6 row(s) affected)