I/O Counters

  • We have some 'AFTER INSERT' triggers that modify the clustered index, causing a lot of fragmentation. I'm replacing these with 'INSTEAD OF INSERT' triggers, and am trying to assess the performance improvement in terms of the reduction in I/O.

    I'm running the same batch of inserts many times, wrapped in a BEGIN TRAN/ROLLBACK tran, so I can repeat the test using the old and new triggers.

    The figures produced by profiler for Reads and Writes are very inconsistent. For exactly the same batch of work using the same set of triggers, the number of Writes vary between 1500 and 8000. Even if I force a CHECKPOINT and do a DBCC DROPCLEANBUFFERS before executing, I still get the same variation.

    Using STATISTICS IO, gives consistent results between each run, but doesn't show Writes. Even then, I can't reconcile the number of reads shown by profiler with those shown by STATISTICS IO e.g. one shows 320 reads, and the other shows 820 reads.

    Does anybody know why the figures should vary so much?

    How can you get accurate I/O figures?

    Forgot to mention, SQL 2K Enterprise Edition SP3, Win 2K Advanced Server SP4.

    Edited by - ianscarlett on 09/16/2003 01:31:53 AM

  • What is your clustered index on? If this is a composite clustered index you will always get unpredictable logical IO (page reads).

    If you main goal is to reduce IO on inserts/updates, having the clustered index on an Identity column would be your best bet as any inserts fall to the bottom of the last page of the leaf level.

    HTH

    Ray Higdon MCSE, MCDBA, CCNA

    ------------

    Ray Higdon MCSE, MCDBA, CCNA

  • Ray,

    Thanks for the input. Can you explain why will you get unpredictable logical reads because it is a composite index?

    For most of the tables affected, the clustered index is composite, and has been specifically chosen to support the large number of queries that are performed against the data.

    My main goal is to reduce the I/O on inserts. Some of the columns in the clustered index are being created with blanks, and the triggers replace them with the 'correct' data. The change from AFTER INSERT to INSTEAD OF INSERT seems to have improved things significantly, but it's difficult to quantify the improvement without accurate statistics.

  • A clustered index sits at the leaf level of the data in an ordered fashion. So let's say I have a one column clustered index and the data looks like this:

    a

    a

    b

    c

    d

    e

    f

    If I insert a row into that table and the column that is a clustered index is a value of B, it will insert it into the page where my B's are. If that page is full it will split that page and reallocate the latches to attach to the new and old page. This is not good unless you play with your fill factor.

    If I have a composite index of three columns and the data looks like this:

    a 20 z

    a 30 b

    b 10 a

    b 20 c

    Now when I insert new rows, it has a more calculations to do and you really can't accurately tell where the page splitting will occur. Plus, as non-clustered indexes use the clustered key as the row locator any non-clustered indexes will be bloated. If I put a non-clustered index on a fourth column, the leaf level of the NCI page would look like this:

    a 20 z cat

    a 30 b dog

    b 10 a rat

    Now instead of having one column in my NCI, making it nice and compact, I have to have my full clustered key in there.

    If you play with stats and place the clustered index on an identity column, all new inserts would fall to the bottom page at the leaf level and I bet you will have less logical IO. How that affects your query, you will have to play with but I would much rather use composite non-clustered indexes instead of composite clustered indexes.

    HTH

    Ray Higdon MCSE, MCDBA, CCNA

    ------------

    Ray Higdon MCSE, MCDBA, CCNA

  • Ray

    I understand your argument about having a small clustered index, but surely it makes no difference if I had a single column 15 bytes wide, as opposed to a composite key made up of of 3 columns, each 5 bytes wide.

  • Size is surely a factor but not the factor I was concentrating on, all I meant was it's harder to predict where your inserts will go unless all three columns are somehow related (all sequential or something) . With the one column it is easier to predict where the inserts will occur (alphabetically or numerically) If I have one column of numerics, I can easily have a rough idea where the row will be inserted whereas if I have three numerics, makes it much harder.

    Bottom line is don't listen to me or anyone else, test it, maybe you are somehow saving IO by having the composite index, I doubt it but have seen stranger things happen.

    HTH

    Ray Higdon MCSE, MCDBA, CCNA

    ------------

    Ray Higdon MCSE, MCDBA, CCNA

  • I think ::fn_virtualfilestats will give you the most accurate numbers (if you can isolate your test on a test database without other users using the same files)

    also the physical I/O column of sysprocesses is pretty accurate

    'statistics i/o' is pretty useless when doing ReadAhead (which is default)

  • For physical IO I agree, but when using "statistics IO" logical IO is only incremented if it is actually used in the query so logical IO or the number of pages read is accurate.

    Ray Higdon MCSE, MCDBA, CCNA

    ------------

    Ray Higdon MCSE, MCDBA, CCNA

  • I agree with Ray that the logical I/O is a correct number of the db-pages visited. It will indeed give you a 'logical' cost you can compare.

  • I've always used the logical reads from statistics IO in the past, and found it very consistent. Unfortunately, it doesn't show writes.

    I had also tried fn_virtualfilestats on my development workstation which nobody else had access to. I did a dropcleanbuffers and a checkpoint before each iteration.

    Although it was a bit more consistent, I still had wild fluctuations on occasions (4 out of 5 iterations would be consistent, but the inconsistent result would show almost twice as much IO).

  • In order to trace physical I/O's there's a 'dbcc iotrace' command around, that might give you the statistics you want:

    iotrace ( { 'dbname' | dbid | 0 | -1 }, { fileid | 0 }, bufsize, [ { numIOs | -1 } [, { timeout (sec) | -1 } [, printopt={ 0 | 1 }]]] )

    Run a separate sql-session which will start the trace, do you stuff and stop the iotrace.

    I/O trace will be displayed

    I had troubles getting results when running very small queries (in terms of I/O's).

    I also found '1's in the'Iswrite' columns when doing non-RA reads, which is quite strange I think.

    It will show: dbid/fileid/pageid/iosize/Islog/Iswrite and

    printopt=1 will also show pagetype/objectid/indexid.

  • Thanks mbroodbakker, I hadn't come across dbcc iotrace before, so I'll give that a go.

  • I assume this is what mbrood meant but the logical IO is not the number of DB pages visited, it is the number of times a page was hit (1 page can be hit multiple times in the logical IO numbers)

    Ray Higdon MCSE, MCDBA, CCNA

    ------------

    Ray Higdon MCSE, MCDBA, CCNA

  • Yes, with 'visited' I meant 'hit'. In most cases high numbers of hits of the same DB-page in memory are primary causes of high CPU usage in the database world where I come from: Oracle. (some call it: good buffercache hit ratios) I guess the same is true for the SQLServer world..

    Mario (Broodbakker)

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply