The SQL * Wildcard

  • Always select just the columns that you need.

    That goes without saying for the most part. But, again, I want to remind everyone that the OP said that Microsoft said that even if you need to select ALL the columns, you should use discreet column names instead of SELECT *... that's what we're really trying to figure out.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • I agree with everyone that you should only return what is needed and SELECT * might be a problem with future changes, but that wasn't the question.

    The question is how is SELECT * different than SELECT (all columns). Indexing doesn't matter. You're going to read all columns, so you have a bookmark lookup occurring no matter what.

    I suppose it could require a system table lookup, but I'm not sure that the first doesn't also. Doesn't it have to read sysxcolumns to figure out the "names" of the columns being returned? After all, on the data page they're just offsets. The headers aren't stored there, so the system would need to query sysxcolumns to find the offsets for each column referenced.

    If anything, I'd think SELECT * was faster because it doesn't have to check where ColA is, then where COLB is, it just reads the columns in the order they're stored (moving variable data back to the pointer position), and grab the column names in order.

  • Dunno. I'm going to do a bit of testing tomorrow, see if I can dig up any differences in how the queries run.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • My 2-cents on this is that it is probably some unique "quirk" in SQL Server (version unspecified). What version? 7, 2000, 2005, 2008?

    I've seen stuff like this before (in my 23 years of SQL database work covering DB2 (IBM), Oracle, and SQL Server). What often happens is that the unique, vendor and/or version specific method of doing something, for whatever reason, is often negated in a future release (or bugfix). Sometimes these specific methodologies get published as "gospel" and/or ingrained in one's mind and we never let go -- or revist the manuals for the latest techniques.

    Luckily, in this specific case, the methodology suggested, for whatever miniscule performance improvement, is a proper coding "best practice" and not the other way around.


    [font="Arial Narrow"](PHB) I think we should build an SQL database. (Dilbert) What color do you want that database? (PHB) I think mauve has the most RAM.[/font]

  • my guess is that you incur the sys table lookup to find out the name of the columns

    also by not prefixing with dbo., you incur a sys table lookup for a schema specific table (i.e. joe.Orders as opposed to dbo.Orders) (this is true with executing stored procedures too!)

    also, execution plan reuse may be an issue because between executions, the optimizer may not know if the table layout has changed from an alter table command so it may have to redetermine the current list of column names every time??? if you specify the list (even all of the column names), it is an exact match and can reuse the plan??? (this is just speculation.)

    or, the MS interviewer was just messing with your head.

    Joe

  • I am hoping that once enough time passes, you intend to follow up with this manager, show your results, get his answer, and most importantly, share with the rest of us, right!?

    Patiently waiting...

    -- Cory

  • 1) There has to be a system lookup no matter whether it is SELECT * or SELECT (all explicit columns) because the data type of each column also has to be determined in order to build out the resultset information, whether the name is 'needed' or not. Permissions would need to be resolved too.

    2) My answer to the OP is simply because of the possibility of breaking something in the future with column addition/removal.

    3) Why doesn't the OP just ASK the manager who made the statement what HIS answer is?? He did get the job after all and should be able to fire him an email with such a simple question. :hehe:

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • I did a quick check earlier. I couldn't see anything different with profiler, and the times were the same. I just don't know of any way to check system table lookups.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I have not checked this myself, but one possibility for using named columns instead of SELECT * could be included columns in an index.

    I agree that SELECT * FROM table should have the same performance as SELECT col1, col2, ... FROM table, but what about SELECT ... FROM table WHERE ... The WHERE clause means the optimiser has to consider index usage. If you do not tell the optimiser explicitly which columns you want, it has no way to check if they are all in the index it chooses. Therfore you get the index lookup followed by a table lookup, when all required data may be in the index...

    Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

    When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

  • I agree with most of you, especially Jeff, Steve, Gila. Out of curiosity I sent this question to Itzik Ben-Gan to see what thinks about it. Here is a summary of his comments.

    When asked, why SELCT FIELDLIST is better than SELECT *, his response was

    "Think about future schema changes.

    For example, if a column is added to table1, and the application doesn’t anticipate it, it could break.

    Another example is if the schema of the table is reorganized, and is created with different ordinal positions for the columns. Again, the application might break.

    All these issues are avoided when you simply list the column names explicitly."

    When asked about the performance benefits, his comments were the following:

    "Frankly, I think that the question is moot, since as I mentioned already there are some very important logical reasons (not performance) to justify using an explicit column list and not *.".

    "From my experience, the difference in terms of performance is negligible. Both * and the explicit column list need to be verified against metadata.

    Expanding * should involve some extra resolution work, so in theory, it should take longer. But the extra work involved is usually so negligible compared to other costs involved in the query that I wouldn’t consider the performance aspect a compelling argument (assuming you compare * to an explicit list of ALL columns)."

    I asked some specific questions based on the discussion we had in this forum. His answers are the same as what we have seen from other experts in this forum.

    1. Execution plan re-use: it happens with both SELECT FIELDLIST as well as SELECT *

    2. Meta data lookup: happens with both approaches

    His conclusion is :

    "I’ll repeat myself and say that if there is a performance difference (though minor) it’s in the favor of the explicit column list; and since there are many important benefits unrelated to performance in the favor of the explicit column list, it’s a win win situation. "

    This does not really answer the problem we are discussing. But I thought sharing this in the forum will he helpful as his comments closely matches with what is already discussed by jeff, gila and a few others.

    I think we need to wait, just like Cory suggested. 🙂

    .

  • Okay, this one intrigued me a lot. I often use * when beginning anything because "I'll worry about it later" but the idea that this was a performance issue was interesting enough. After digging around for a while I actually found an answer on Microsoft of all places :w00t: Either way here's the link http://msdn2.microsoft.com/en-us/library/ms998577.aspx

  • GilaMonster (12/27/2007)


    Other than protecting against future changes, I can't think of a good reason.

    Possibly SELECT * requires an extra lookup to the system tables to get the column names. Kinda like the sp_ double lookup.

    Would be interesting to check, but I'm not quite sure how. Maybe profiler and the object Access events

    The reason I had heard is that it requires an extra query be run against things like syscolumns to FIND the column names. Although a trivial query - if the piece of code is to be executed 100,000 times a day, that's that many more chunks of code that could have been avoided.

    And it's extra, because this step would happen before the step that goes in and verifying type appropriateness....(Meaning - during the PARSE phase, not the RESOLVE phase)

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Hmm, I wonder about that. I would think that the optimizer would 'flesh out' the full column list for SELECT * queries at the time the plan was created, instead of doing additional work at execution time. Could easily be wrong on this but it makes a bit of sense.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Okay, now I'm curious.

    I'm running a test on a table that has 69 columns and between 6-7 million rows of data. No WHERE clause and I'll include the "DropClearBuffers" statement to make sure I get a good reading.

    I'll let you know what I get.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • The particular issue I was speaking to would take just the opposite kind of testing to nail down. Think 1M iterations of a select from a single tinyint table with just one row. use select * for one run and select fieldname for the other. I would think the (secondary?) system lookup would take only 0.1 to 1 millisecond to perform thus the very large number of iterations to test.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

Viewing 15 posts - 16 through 30 (of 60 total)

You must be logged in to reply to this topic. Login to reply