What's important for monitorring?

  • Hello guys!

    For a project for school we need to implement some stored procedures in SQLServer 2005, at a later time the stored procedures will be implemented in Nagios, the Nagios part is no problem, but the problem is we don't know what is important to monitor by a DBA?

    What we have:

    - check Uptime

    - Check failed logins

    - check log size

    - check #connections

    - check failed jobs last x days

    - check logon settings

    - check for exists of sample databases

    We need more check, but we don't know which, at the moment I'm installing MoM to see what they do.

    Anyone some ideas?

    Kind regards!

  • This is what comes to my mind. This is not complete list, I am sure other members will add some more.

    - blockings

    - tempdb size

    - empty space left on hard disk

    - buffer cache hit ratio

    - object cahce hit ratio

    - processes consuming the most of resources

    - CPU usage

    - memory usage

  • Availability, consistency, and recovery seem to be a good things in the database world. For some reason -- the users have come to expect such silly things. 🙂

    So, can't have recovery without backups. Can't have availability without the server running and can't guarantee consistency without running verification.

    For all of my systems, I make certain that I always have a full or differential backup < 1 day old for EVERY database. This applies to dev, test, and production.

    For some production databases, my data loss interval is 15 minutes, so if I don't have a t-log backup every 15 minutes, I want a message. In general, however, if I don't have a t-log backup every hour (on production databases), I want a message.

    You can verify the consistency by checking the last known good checkdb state. If I haven't noticed the checkdb jobs failing, I want a message if integrity has failed for two iterations (some weekly, some daily -- depending on db size and requirements)

    You already mentioned availability, so I figure you got that covered.

    Kyle

  • I like Kyle's thoughts. Backups are the most important. Be sure you know that every database has been backed up or you get a notification and can do something immediately.

    I've never really worried about database size, but I do track backup size. If the backup grows by more than 20%, someone should look at it. Tracking this over time helps to figure out trending as well.

    Disk space is good, and cpu/memory are good for tracking what's normal. I'd probably stop with buffer cache ratio, users, trans/sec as gross metrics, but if you add a few more, then it shouldn't be an issue.

    I've rarely had tempdb problems, but that's not a bad one to track, especially as code changes can really cause issues in tempdb, so knowing what's normal is good.

    Uptime, not sure. If it's a metric you can sell, then fine. Otherwise, who cares. It works when it's needed or your phone rings. That one's easy to know. If you have perceived issues, perhaps this helps to show things aren't bad.

  • Thank for al this answers, I'll implement a few of them tomorrow, the cpu, memory monitorring is easy as hell in Nagios. For us it's more the part of database monitorring..

    If someone else have other ideas, let them know !

  • you can also change your sa passwords on regular basis to keep the security in palce.

  • Also you can Track changes of your Production database like Schema changes, credential changed etc. by using DDL AND DML Triggers

    ---------------------------------------------------
    "Thare are only 10 types of people in the world:
    Those who understand binary, and those who don't."

  • Adding along to the list..

    Check the health of your db by running dbcc check db. Or you can schedule the same by creating a maintenance plan.

    "Keep Trying"

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply