Any Other Problem with NOLOCK other than Dirty Reads?

Question

Post reply

Any Other Problem with NOLOCK other than Dirty Reads?

Viewing 13 posts - 31 through 42 (of 42 total)

You must be logged in to reply to this topic. Login to reply

Paul White SSC Guru Points: 150457 More actions · Answer 1

Rob-350472 (1/31/2012)

Just wondering what you'd make of using nolock in this situation...

Running that query against a large set of customers appears to prevent others from adding a communication until the query has completed, which if there's a bunch of other fields can take a while.

If the shared locks are escalated to a table lock, this can indeed happen. There are many options here, including disabling table-locks for the problematic table. RCSI would also work well, of course.

In this instance the only problems I can see (aside from that odd one about taking out an entire database!) are that we'd potentially get an incorrect count of communications or the most recent date would be inaccurate, however, if you were doing this against a set of say 50,000 customers and the query took 1 min to run the chances of this happening are fairly slim, if it does happen it's not a huge issue as most of the data will be fine.

As noted previously, the query could also fail with an error, but in general yes: if you only require an approximate view of the data, there are times where READ UNCOMMITTED is sufficient. I still prefer RCSI in 99% of cases, not least because you get much higher consistency guarantees even than provided by the default isolation level.

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

jcb SSCertifiable Points: 6574 More actions · Answer 2

jcb

SSCertifiable

Points: 6574

January 31, 2012 at 3:55 am

#1440484

GSquared (1/30/2012)

...

If I remember correctly, putting static tables on a read-only fileset will give the same benefit...

Thanks GSquare that is worth to know!

Snapshot isolation was introduced in SQL 2005, most system are writed before it and stills rely on nolock (for bad most of times).

GSquared SSC Guru Points: 260824 More actions · Answer 3

SQL Kiwi (1/30/2012)

GSquared (1/30/2012)

It will potentially save the CPU overhead of establishing a shared lock. On a seriously overloaded system, that could be beneficial.

The Read Committed isolation level guarantees two things: you will not experience dirty writes or dirty reads. It does not say anything about how those guarantees should be implemented. SQL Server happens to use shared locks as part of its implementation of the default isolation level – but you cannot rely on those locks always being present. The storage engine is smart enough to avoid taking many shared row locks when there is no danger of reading uncommitted data, see The Case of the Missing Shared Locks

Yeah. That's why I said "potentially" and "could be". It's unlikely to be helpful, but on a "fully loaded camel", where one more straw will break its back, it might, maybe, possibly, save a few CPU cycles and RAM blocks.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

GSquared SSC Guru Points: 260824 More actions · Answer 4

Rob-350472 (1/31/2012)

Just wondering what you'd make of using nolock in this situation...

You want to obtain the most recent communication and a count of communications for a particular customer.

Running that query against a large set of customers appears to prevent others from adding a communication until the query has completed, which if there's a bunch of other fields can take a while.

Adding a nolock gets around this issue, people can continue to add communications.

In this instance the only problems I can see (aside from that odd one about taking out an entire database!) are that we'd potentially get an incorrect count of communications or the most recent date would be inaccurate, however, if you were doing this against a set of say 50,000 customers and the query took 1 min to run the chances of this happening are fairly slim, if it does happen it's not a huge issue as most of the data will be fine.

Clearly it's best to get all of the data 100% accurate all of the time, however, if the only way of (currently) doing this is locking up the table preventing others from writing to it then that's not an ideal solution.

It does seem that READ_COMMITTED_SNAPSHOT is worth looking into more, however, I just wonder what people's take is on the above usage?

That sounds like a good candiate for RCSI to me.

If nolock/read-uncommitted is needed there, it's needed. Those aren't something I'd take out of the product, they're just something a lot of people use without understanding all the ramifications of what they're doing. They do have their place.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Rob-350472 SSCertifiable Points: 7197 More actions · Answer 5

So if you just turn RCSI on, what are the implications for running a query against say a table who's data changes virtually constantly (think a bit like weblogs or something), and is very large (several GB in size).

You query against that data, RCSI prevents that table from being locked but how? Does it take a 'temporary' dump of a few GB size table? If so won't you find your database size grows by several orders of magnitude?

As I say, I don't fully understand it, but as someone who's taken to using NO LOCK quite often, (usually because 100% accuracy is not mission critical, and the sort of errors would not be large enough to force incorrect or poor decisions) I'm keen to see how this would work in real life rather than on paper.

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 6

To be honest, it works very well. I have 2 clients with all their databases on read committed snapshot and another 2 that I plan to convert over (one after I finish some index consolidation, one after I get their TempDB usage under control)

Rather than me write a couple paragraphs on how it works, take a read through this: http://msdn2.microsoft.com/en-us/library/ms345124.aspx

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

GSquared SSC Guru Points: 260824 More actions · Answer 7

To summarize, snapshot isolation, either flavor, takes rows that are being worked on (updated or deleted) and creates a snapshot of them in tempdb. It uses a row-version number to manage this. If there's a concurrent query of those rows, it gets the last version that was committed before the query started, from the snapshot, while the other query finishes its work. Once committed, they come back out of the snapshot if there are no queries referencing them.

It's an "instead of a lock" type thing. Instead of locking a row, it takes a point-in-time snapshot of it, assigns a version number to it, and keeps it in tempdb till it's no longer needed.

So, the size of the table doesn't much matter. It doesn't snapshot the whole table, just the rows that are currently in-progress on having their data changed. If you have a lot of concurrent transactions on a lot of rows, that can be a lot of tempdb space used. But normalization to at least 3NF usually makes that a non-issue or at least mitigates it into something minor. (That's a bit simplified, of course. Full details are on MSDN. This is a summary.)

For example, if you have a table that's 8kb wide (max normal table width in prior versions of SQL Server), and you're running thousands of transactions per second, you could easily end up with a few Mb of tempdb space being used at 8kb X 1,000 transactions in-progress = 8Mb. It's not as precise as that, since there's overhead for the row-version number and other factors, but that gives you an idea. It doesn't matter if the table in question has millions of rows unless all of them are being constantly updated, which would be very, very unusual.

I have an 18Gb database with 15-million rows in 3 tables (name and address type data). It's in constant flux, with thousands of names being added every day through a wide variety of websites servicing time zones all over the planet, et al. It's one of over a dozen databases I admin that's using RCSI on that server. Tempdb has grown to 162Mb on that server, largely because of RCSI. There are also several databases on that server where whole tables are emptied and reloaded every hour, also using RCSI, but still it's only 162 meg of tempdb, despite every row in every one of those tables having to be versioned in there every hour. Those are smaller databases, but that should still give you an idea of the scope on this.

Does that help?

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Rob-350472 SSCertifiable Points: 7197 More actions · Answer 8

Does that help?

Yes, a lot! That's really great, thank you for that brief summary. Perhaps you could see where I was going with my fears? - I had visions of 10s of GBs of tempdb within a day or two as a result - proving that I didn't really get how it worked!

I guess in a way it's quite similar to how shadow copies would function within Windows Server to allow backup of files. Okay so that's a simplified explanation, but just because you take snapshots every 2 hours it doesn't mean the Shadow Copy storage needs to be 12x the size of the data it's snapshotting for daily snapshots.

I'll mention this to a colleague of mine.

@Gail, thanks for the link, I opened it in a new tab, alas not had a chance to even click on that tab yet!

SqlServerPass SSCommitted Points: 1709 More actions · Answer 9

GilaMonster (1/31/2012)

DBA Rafi (1/31/2012)

After the sales process, i am seeing some factors which does not change, but more SELECTs on them, like

Scenario #2: Static / Hardly-changeable Information (Master tables)

If the data in question doesn't change much or at all, then why do you need nolock? Unless your system is completely loaded, the small CPU saving from not taking locks will be minuscule.

The columns which i am SELECTing doesnt change, but other columns could change.

Nadrek SSC-Insane Points: 20039 More actions · Answer 10

Michael Valentine Jones (1/30/2012)

SQL Kiwi (1/30/2012)

...

Code that has NOLOCK on every single table in every single query is an infallible indicator of poor coding, in my experience.

I mostly agree with this, except that it is also often an indicator of poor physical database design leading to massive blocking and/or deadlocking.

I second Michael's comment; in an environment such as a poorly designed, highly active, transactional third party database, when one is required to run "large" aggregate reporting (joining multi-million row tables which are being updated continuously), "nolock" (READ UNCOMMITTED) allows "close to accurate" results without blocking other activity.

In some cases, the errors from the READ UNCOMMITTED isolation level are, statistically, do not exceed the difference in results that would come from running the reporting query a few minutes/hours later in the day. Know your environment, in others it would be a problem.

See MSDN SET TRANSACTION ISOLATION LEVEL (Transact-SQL) for a discussion, including the warning about SNAPSHOT isolation: "When the snapshot isolation level is enabled, if you are deleting a row on a heap (a table without clustered index) and the transaction log fills before the log record for ghost row is persisted, the database is taken offline. If this occurs, the database will automatically restart, undergo a full recovery, and come online."

Also see Isolation Levels in the Database Engine, which discusses which isolation levels are subject to dirty, nonrepeatable, and phantom reads.

Rob-350472 SSCertifiable Points: 7197 More actions · Answer 11

Rob-350472

SSCertifiable

Points: 7197

February 2, 2012 at 2:27 am

#1441620

That sounds like quite a scary outcome, compared to the possibility of dirty reads!

Paul White SSC Guru Points: 150457 More actions · Answer 12

Paul White

SSC Guru

Points: 150457

February 2, 2012 at 4:06 am

#1441651

Rob-350472 (2/2/2012)

That sounds like quite a scary outcome, compared to the possibility of dirty reads!

Just note that refers to snapshot isolation (SI), not read committed snapshot isolation (RCSI).

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 13

And bear in mind that it requires that the log becomes full, which causes problems anyway.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass