monitoring multiple servers

  • Whats teh best way to monitor failing jobs on muliple servers specially if you have hundreds of them. Does anyone have any script that could b used or is there any other way to get a report of thsi. Any help will be greatly appreciated.

    TIA

  • I suggest you take a look at Books Online -> multiserver administration. 

  • I created some logic from the master and msdb databases that shows the not correctly finished jobs and below that shows all the jobs.

    The report is a html file created with the sp_makewebtask sp. It runs every day minutes before I get in.

    You can also use linked servers to import all the jobs from all the servers to 1 server that will show everything in 1 page.

  • You can use DTS to extract the running status from other servers. For each server that you want to extract, use the code below:

    select

    server = convert(varchar(20), @@servername),

    s.job_id,

    jh.step_id,

    jh.step_name,

    run_date = case when jh.run_date > 0 then convert(datetime, convert(char(8), jh.run_date),112) else NULL end,

    jh.run_duration,

    run_status = case

     when jh.run_status = 0 then 'Failed'

     when jh.run_status = 1 then 'Succeeded'

     when jh.run_status = 2 then 'Retry'

     when jh.run_status = 3 then 'Canceled'

     when jh.run_status = 4 then 'In Progress'

     else 'Unknown'

     end,

    scheduled = case when s.enabled = 1 then 'Y' else 'N' end,

    schedule_name = case when s.[name] is null then '' else s.[name] end,

    Time_Completed = case

     when len(convert(varchar(10), jh.run_time)) = 3 then  '00:0' + left(convert(varchar(10), jh.run_time), 1)

     when len(convert(varchar(10), jh.run_time)) = 4 then  '00:' + left(convert(varchar(10), jh.run_time), 2)

     when len(convert(varchar(10), jh.run_time)) = 5 then  '0' + left(convert(varchar(10), jh.run_time), 1) + ':' + substring(convert(varchar(10), jh.run_time), 2, 2)

     when len(convert(varchar(10), jh.run_time)) = 6 then  left(convert(varchar(10), jh.run_time), 2) + ':' + substring(convert(varchar(10), jh.run_time), 3, 2)

     when jh.run_time = 0 then '00:00'

     else ''

     end,

    Scheduled_Time = case

     when len(convert(varchar(10), s.active_start_time)) = 3 then  '00:0' + left(convert(varchar(10), s.active_start_time), 1)

     when len(convert(varchar(10), s.active_start_time)) = 4 then  '00:' + left(convert(varchar(10), s.active_start_time), 2)

     when len(convert(varchar(10), s.active_start_time)) = 5 then  '0' + left(convert(varchar(10), s.active_start_time), 1) + ':' + substring(convert(varchar(10), s.active_start_time), 2, 2)

     when len(convert(varchar(10), s.active_start_time)) = 6 then  left(convert(varchar(10), s.active_start_time), 2) + ':' + substring(convert(varchar(10), s.active_start_time), 3, 2)

     when s.active_start_time = 0 then '00:00'

     else ''

     end

    From sysjobhistory jh right outer join sysjobschedules s on jh.job_id = s.job_id

    The result should be stored in a table so that you can query against it.

  • what you can do ( that we do in my company with few hundreds servers):

    Create one central server, where you will be keeping all essential info about your servers. Link all your servers to the central server, create a job(s) to monitor : job failures, disk space, suspect db, name it. So that way it will be only one job running to provide you all info in each category. Unfortunately it is not just one script. We have pretty big database designated only to that...

     

  • You're right. It will not be just one script. I only showed where you can get the information about failing jobs.

    Since you mentioned your company have hundreds of servers, do you mean SQL Server ? If that's the case, it will be too much for one single DBA to handle all the SQL Server(s). As a DBA, you should not allow users to create jobs on production servers. This way you have control as to where the jobs should be created.

  • Yes, we have about 200 hundred SQL Servers(6.5-2000; up to DataCenter). We have 3 companywise prod DBA on top of the other DBAs who work closely in each of the groups. Some Servers are also covered by group DBA (who are both Dev/Prod at the same time) , some servers has only Dev DBA/SQL Server Developer. So if the job fails on any of the servers (and I am getting a notification on that), we first look at the nature of the job (there some jobs which are ok to fail) and then, in most of the cases I re-run the job, or let group prod dba know, so he/she would take care of the failed job. If Prod Dba for this group doesnt exists, then I take care of the situation... Hope, I didnt confuse you...

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply