Can anyone think of a good reason for NOT having a PK?

  • In one application we have a bunch of small domain tables.  Things like state codes and names, transaction type, payment types, user types, and on and on.  None with more than 100 rows.  Seldom, if ever, changed.

    We added indexes to them and performance dropped. Especially on the mobile.  Dropped the indexes and SQL went back to using table scans.  Almost all querries were designed to return a single row.

    There was some talk about moving these to flat files but since our server to device SQL synchonizer works so well we just kept it with the table scans.  Shh. (Don't tell anybody but it works so well that we use it for software updates too.)

    These special cases are few and far between.  We did extensive testing to prove that any index at all made performance worse.  It's a reason.  I don't believe a GOOD reason.  It's a performance workaround.  On the other hand I've done some testing in 2005 an found that there are times that multi-column indexes boost performance by having all the data in the index and my querries don't touch the real table at all.

    ATBCharles Kincaid

  • some example data/query?

  • USE [Production]

    GO

    /****** Object:  Table [dbo].[State] ******/

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    CREATE TABLE [dbo].[State](

     [Code] [nvarchar](100) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

     [Description] [nvarchar](200) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

     [SourceSystemId] [bigint] NULL,

     [LastUpdateTime] [datetime] NULL,

     [UpdateStatus] [nvarchar](2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL

    GO

    SELECT [Description] FROM [State] WHERE [Code] = 'MO'

    ATBCharles Kincaid

  • Yes, if you want to add an index to a small table, and you should for many of them, always make it a unique, clustered index.  For example, for the sample table shown above:

    CREATE TABLE [dbo].[State](

     ...,

     CONSTRAINT State_CI

         UNIQUE CLUSTERED ([Code])

    This should speed joins on the table, since SQL can do an index seek and, even for a scan, will "know" it can stop after the first match.

    SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

  • I raised such a stink about it that they finally agreed to go back and put identity columns on all tables.  Well almost all.  There are a couple that don't have identity columns and the reasons give away too much of our design.

    The indexes made the joins slower on the mobile and speed on these small tables is not an issue on the desktop.  Our sync tool keeps the schemas in sync so any change on the schema gets pushed down.

    The real key is to have, and take, the time to test every situation that you can.

    ATBCharles Kincaid

  • if it's a static table, I'd recommend NOLOCK hints, so that no other resource needs to wait for the read to finnish. If you only expect one row, then TOP 1 will help the performance from what I understand. but I digress... are you doing searches on different fields anywhere else?

  • The index should be on the search column(s), not on an id column.

    SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

  • Well. In a case when you have limited amount of data that fits into single page creating an index will decremet the performace because you will ask the server to go thru two pages of data (index page and data page) instead of one (just data page) to get you information. In a case like this it is better to go for the table scan which in effect is a single page scan.

    ---------------------------------------------
    [font="Verdana"]Nothing is impossible.
    It is just a matter of time and money.[/font]

  • Thank you.  This is exactly what the testing showed.

    ATBCharles Kincaid

  • Are there any articles on this? What is meant by a table scan? You've peeked my interests.

  • I don't know about any specific articles I could point you to. I think I kind of deducted this myself based on what I know about the way the server works (which might not be 100% accurate ). I think some of this is explained in BOL.

    Table scan is a search process the server performs when you have no index on a table. It goes thru all the records in your table until it finds all that match your criteria. It does it by reading page after page. If you have an index (and you search on the indexed field) then the server goes thru the index pages to find out the page your data is in and then just goes thru this page. The advantage of the index is that it is guaranteed that the index data is sorted and more 'records' can fit into single page. So the server uses less pages to find your data.

    This description is probably very simplistic but it should explain the idea behind the indexes.  

     

    ---------------------------------------------
    [font="Verdana"]Nothing is impossible.
    It is just a matter of time and money.[/font]

  • I've read this with great interest and agree with a lot of what some people have said, and disagree with a number of others.

    I tend to be in the camp that on almost all occassions ther should be a PK, which may or may not be a clustered index, but is almost always the 'business key' (see Kimberly Tripp Bolg link later for explanation). Times when I might not use this approach may include small tables where a table scan may be more efficient than going through the index (usually decided from testing, and re-testing as necessary to make sure it is still the right approach) and bulk-inserting data into staging tables (though there have been setups where we build indexes on these tables to help in the population/transformation to the data's final tables). Also remember that, unless you know for definite that the datetime will be unique, if you use this datatype for your clustered index and you have duplicates SQL Server will stick a 4-byte uniquifier on the end (as it will for any data type where you allow duplicates in the clustered index).

    My belief is that it is better to control the uniquness yourself, so I lean towards Kimberly Tripp's view on clustered indexes. Where I use datetime (or even smalldatetime), I will put it with an auto-incrementing number as the clustered index when needed (I won't go into the arguments for or against here as I think KT puts it much better than I can -  http://www.sqlskills.com/blogs/kimberly/2007/02/02/TheClusteredIndexDebateContinues.aspx). I may use the same technique if I has a candidate for a clustered index but cannot guarantee uniqueness. And if for a clustered index candidate does not stand out for me...then I will use a monotonically increasing value as the key. Then I will make tactical use of non-clustered indexes (incl. covering) as required.

    I'm also of the opinion that query hints should be used with great care because of known issues that may arise, such as duplicates possibly returned when using ...WITH (NOLOCK)  (can't find the link, but if you search sqlservercentral, there are some great discussions on this).

    When it comes to DSS systems, then the use of indexes, and especially clustered indexes, are almost always a must. A well-chosen clustered index, along with carefully chosen non-clustered indexs should generate good performance benefits, esp. on large tables. The skill is getting the windows to rebuild/update indexes, which may or may not include partitioning. With the use of auto-incrementing values there is no reason not to be able to create a unique value.

    Just my thoughts.

     

  • NOLOCK is primarily used for (semi) static tables. You are just eliminating unecessary overhead. Pretty much agree with everything you say though.

  • Can you post an example of your join query?

    Even in small notebook it's faster to find a record if it has A-Z index (which is actually clustered index).

    _____________
    Code for TallyGenerator

  • [Hi Kevin] If you mean by semi-static that, say, in a DSS DB where the data is imported during the night and during the day when reports/scripts are run, that the data doen't change, then yes I agree. But this is why I said you should use locking hints with great care. You need to unedrstand your environment very well before you start playing with locking hints, and esp. with NOLOCK.

    The reasons for having care when using NOLOCK this have been discussed in sqlservercentral before, and I will quote Colin Leversuch-Roberts '...Isolation levels are critical to data intergrity and as mentioned above scans with dirty reads are capable of returning duplicated data where duplicates cannot exist - Itzik Ben-Gan demonstrated this to the UK SQL Server User Group last year, if you check out Tony Rogerson's blog he has code which demonstrates this. http://sqlblogcasts.com/blogs/tonyrogerson/archive/2006/11/16/1345.aspx...' (link to discussion is http://qa.sqlservercentral.com/forums/shwmessage.aspx?forumid=192&messageid=330889).  Also, look at http://blogs.msdn.com/sqlcat/archive/2007/02/01/previously-committed-rows-might-be-missed-if-nolock-hint-is-used.aspx (MS Dev Customer Advisory Team). The Query Optimizer in SQL Server is pretty efficient at choosing the right locking strategy too and may ignore the hint you use - I know there is some argument on this, too, as some believe that whatever hint you use will be used by the Optimizer, but I don't believe it to be the case (see http://www.sql-server-performance.com/rd_table_hints.asp).

    BTW, for the guy who mentioned using the Index Tuning Wizard (SQL2000), it is generally agreed that this is not a very good tool for recommending indexes and was only useful in helping track those that weren't used, and so could be removed. SQL2005's Database Engine Tuning Advisor is a vast improvement.

    Just some observations.

    [edit]  Sorry Jeff, going off at a bit of a tangent from you're orig question.

Viewing 15 posts - 46 through 60 (of 71 total)

You must be logged in to reply to this topic. Login to reply