All Things Monitoring

Question

Post reply

All Things Monitoring

Chris Kitchen

Hall of Fame

Points: 3090
More actions
March 14, 2011 at 4:41 am

#233837

Hi,

I'm looking to put together a custom SQL Server monitoring solution using a combination of SQL Server scripts which would run as part of SQL Server scheduled jobs and Performance Monitor counters which would store the perfmon data directly in SQL Server.

A "Master" server (Windows 2008 R2) hosting 1 SQL Server 2008 R2 instance would be used to pull and store all the information from the servers in the Production domain.

The items listed under the headings "MEMORY", "CPU", "DISK" and "MISC" will be collected via Performance Monitor. The items listed under the heading "ADDITIONAL OS/SQL MONITORING" would be developed using custom SQL scripts.

My questions are:

1. Are there any additional items you think I should be collecting?

2. Logging the perform data directly in SQL Server requires a "System DSN", therefore the "Master" server would have 1 "System DSN" entry for each server it was collecting information from. To begin with there are 50 servers to monitor, could so many "System DSN" entries cause issues?

3. If you don't agree with the use of Performance Monitor/"System DSN", is there a better way?

We want the monitoring to be as light weight as possible so the counters are limited. Here is the information which will be collected...

######

MEMORY

######

Available Mbytes

Pages/sec

Paging File - % Usage

SQLServer:Memory Manager - Memory Grants Pending

SQLServer:Memory Manager: Total Server Memory (KB)

SQLServer:Memory Manager: Target Server Memory (KB)

Server Buffer Manager Object: Cache Size (pages)

SQL Server Buffer Mgr: Page Life Expectancy

SQL Server Buffer Mgr: Lazy Writes/Sec

SQL Server Buffer Mgr: Checkpoint Pages/Sec

########

DISK

########

Avg.Disk sec/Read

Avg.Disk sec/Write

Disk Reads/sec

Disk Writes/sec

Avg.Disk Queue Length

###

CPU

###

Processor - % Processor Time

System - Processor Queue Length

####

MISC

####

SQLServer:Batch Statistics - Compilations/sec and Recompilations/sec

SQLServer:General Statistics - User Connections

Page Splits (Access Methods object: Page Splits/sec)

SQLServer: SQL Statistics: Batch Requests/Sec

SQLServer: Databases: Log Flushes/sec counter

SQL Batch/sec

System Context Switches/sec

SQL Server Locks Object: Number of Deadlocks/sec

SQL Server Locks Object: Average Wait Time (ms)

SQL Server Access Methods Object: Full Scans/sec

SQL Server Backup Device Object: Device Throughput Bytes/sec

Latch Waits/sec

############################################################

ADDITIONAL OS/SQL MONITORING

############################################################

Database backup information (Success/Failed)

Block processes

Lock processes

Cluster information (where required)

Host server configuration information

SQL Server instance configuration information

Database configuration information

Table information

Index information

Host drive information

Host drive fragmentation

SQL Server security information

Scheduled job information

Success/Failed login information

Warning/Critial Windows Event Viewer information raised in the last 1/24hrs

SQL Server Mirroring monitoring

Purchasing a 3rd party product is not an option.

Thanks in advance.

www.sqlAssociates.co.uk

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

Grant Fritchey SSC Guru Points: 397721 More actions · Answer 1

It looks like you've got the basics well covered. One thing I would add for sure is the ability to capture wait states on the servers. You're going to want to know when and how things are slowing down.

Another thing to consider is capturing trace data so that you can see how individual queries are performing and so you can aggregate performance over time. That's a lot tougher than capturing PerfMon & DMV stats. You'll have to plan for the amount of data you'll be collecting and plan for a mechanism for aggregating that data AND for what you want to do with it long term. It's a lot of work, but it also has a lot of payoff.

----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

Chris Kitchen Hall of Fame Points: 3090 More actions · Answer 2

Hi Grant,

Thanks for your comments, much appreciated.

With regards to looking at what code the users are executing and SQL Server query plans I was planning on using something like:

select qs.execution_count

, qs.total_elapsed_time, qs.last_elapsed_time

, qs.min_elapsed_time, qs.max_elapsed_time

, substring(st.text, (qs.statement_start_offset/2)+1

, ((case qs.statement_end_offset

when -1 then datalength(st.text)

else qs.statement_end_offset

end - qs.statement_start_offset)/2) + 1) as statement_text

, qp.query_plan

from sys.dm_exec_query_stats as qs

cross apply sys.dm_exec_sql_text(qs.sql_handle) as st

cross apply sys.dm_exec_query_plan (qs.plan_handle) as qp

Given the complexities you mentioned around the trace flag configurations, do you think this is a valid alternative?

Also, do you have any thought/experiences around the use of a "Master" server with many "System DSN" connections to the servers it monitors? I've ran several tests and cannot see any issues from a network or host perspective.

Many thanks,

Chris

www.sqlAssociates.co.uk

Grant Fritchey SSC Guru Points: 397721 More actions · Answer 3

Using the DMVs in that manner is fine. You just need to know, and be aware, that they only reflect what is in cache when you run them. If stuff has left the cache, it won't be in the query.

Most monitoring processes work from a central server that connects to multiple other servers, so it must be a good model. I've had a lot of experience using the Central Management Server in SQL Server to manage Agent jobs and Policy Based Management. It works well. I don' see why it wouldn't work with monitoring too.

BTW, have you looked at Microsofts Data Collector and Management Data Warehouse? They have some limitations (2008 and above being one of the biggest), but it might get you where you need to go faster.

----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

Chris Kitchen Hall of Fame Points: 3090 More actions · Answer 4

Chris Kitchen

Hall of Fame

Points: 3090

March 14, 2011 at 7:36 am

#1298310

Thanks again for your help, it's greatly appreciated. I'm just having a play around with the "Data Collection" tool now and from what I have seen I will definitely be including it in my documentation as an option to consider.

www.sqlAssociates.co.uk

Lowell SSC Guru Points: 323482 More actions · Answer 5

I've recently been spending a lot of time studying and scripting repsonses to Brent Ozars SQL Blitz! script.

It identifies a great set of things to look for, and has been a lot of fun for me to create my own responses to any items identified in his script.

This is the way i interpret his script 's testing points:

--http://www.BrentOzar.com/go/blitz

1.Identifies any database with no backups at all.

2.Identifies any backups that are on the same disk, maybe should be copied to another server.

3.Identifies any databases that are in FULL recovery, but no Transaction Log backups.

4.Identifies the oldest MSDB entry; if it's old, the MSDB backup history might need to be cleaned up.

5.Identifies the Last Time DBCC CHECKDB (EachDatabaseName) was executed(if ever).

6.Identifies any and all jobs on the server and their statuses.

7.Identifies any other sysadmins so you can decide if they should have their privileges or not.

8.Identifies the servers advanced options for reference.

9.identifies any procedures which are set to start when the server starts

A.Identifies if SQL 2008+ auditing_status info.

B.Identifies various endpoints, resource governor settings,server triggers and linked servers.

C.Tests id database mail is setup (assuming a default profile exists)

D.sets up a default set of notifications for problems.

E.Identifies any user tables that should exist like spt_values,

F.Identifies all databases for reference.

G.Identifies any encrypted databases for reference, so you can better plan disaster recovery.

H.Identifies any databases that are using features that are Enterprise Edition only

I.Identifies location of Data Files, so you can decide if they need to be moved off of the OS's C:\ drive or not.

J. identifies database triggers for troubleshooting purposes.

K. Identifies a baseline for wait time performance.

L. Identifies duplicate indexes

M. Identifies Index fragmentation.

Lowell

--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

Chris Kitchen Hall of Fame Points: 3090 More actions · Answer 6

Chris Kitchen

Hall of Fame

Points: 3090

March 14, 2011 at 10:07 am

#1298427

Thanks Lowell, I found the list extremely useful.

www.sqlAssociates.co.uk