Insert Performance Issue with GUIDS

  • I know using random Guids can cause heavy fragmentation. Even though the Guid is random (i am referring to making use of newid()). Will adding an identity int column as the clustered index and the guid column as the nonclustered index (and guid being primary key for the purposes of referential integrity) speed up inserts? And does insert/update/delete look same in the following 2 cases

    1) Identity column being clustered Index as well as primary key and use the GUID column as a non clustered index

    2) Identity column being clustered Index only and use the guid colum as non clustered index with primary key

    Like what is the difference here from performance perspective?

    Experts need your valuable thoughts

    “If your actions inspire others to dream more, learn more, do more and become more, you are a leader.” -- John Quincy Adams

  • Not the answer to your question, but in case worth discussing / debating.

    We have both IDENTITY and GUID in quite a lot of tables. The GUID is there so that we can SYNC the data with equivalent data in other databases / servers / organisations. We don't use it to associate rows within the APP/Database itself.

    All the (local) Foreign Keys use IDENTITY (except where we use a Natural key) and not GUID. We do have a non-clustered index on the GUID - not sure we need it given that SYNCing data with other databases is not a time critical task (for us), but it does guard against the infinitesimally small chance of a DUP.

    When we SYNC data to a remote database we resolve the ID based on matching the GUIDs at each end. Bit of a PITA to code though ...

    I want a clustered index that:

    almost never changes,

    is unique,

    is as narrow as possible - an IDENTITY INT complies.

    That apart, I want the one that performs best for the table / queries.

  • Assuming your databases are stored on a SAN like most of the people out there, I wouldn't worry that much about fragmentation. Fragmentation really only matters if your disk can do sequential reading, and that isn't happening in a SAN.

    Where you can have an issue is on the insert side, because you will have more frequent page splits than if the clustered index were something that always increases.

    If you don't have inserts happening frequently, then you're not going to notice the performance hit that much. For example, using the GUID as an identifier for your customer number, you probably aren't adding thousands or millions of customers at one time. If you are doing bulk inserts, then GUIDs will not be your friend. The update/delete isn't going to be any different because that's an index seek operation.

    Basically, it comes down to how your data patterns are going to be.

    If you are going to create an INT identity, then is it really necessary to have the GUID? What problem is having the GUID trying to solve? And are you really going to be looking up records by the GUID if you have it?

    Just for a different perspective, I am of the school of thought of using natural keys whenever possible. That makes it easier to catch problems with duplicate data because you won't get 2 John Doe records in the system because they were created with different GUIDs. Same issue applies with using the INT as a record identity. But that is more of a step back from what you're asking, and is more of the data architecture than the implementation. It also requires a lot of thought so that you don't erroneously limit having multiple people with the same name if you really do get more than one person in the system with the same name.

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply