SCOM 2012: Monitor Reset – Cleaning Up The Environment

Many SCOM installations have monitors sitting in unhealthy states.  This is a good thing if monitored items are in fact in an unhealthy state, someone is looking to take action and the monitors automatically reset once the issue has been resolved.  Unfortunately, many times MPs are imported prior to anyone looking at the traffic, some monitors go unhealthy, the alerts get cleaned out for whatever reason, and the results are unhealthy monitors that can only be found using state views and the health explorer.  Additionally, there are manual reset monitors which are a beast unto themselves.

Let’s look at resetting the state of all monitors that are in an unhealthy state.  This may or may not seem like a reasonable thing to do based on your business needs.  You may want to target specific classes or monitors.  You may also want to add logic to only reset monitors to a healthy state if they have been unhealthy for a long enough period of time.  This is certainly possible using the code attached as the basis for this work.

Let’s take a look at what is unhealthy in my lab environment…

Windows Computers (BPA Monitors and others):

image

IIS (IISAdmin Service is stopped):

image

AD (BPA Monitors):

image

Custom App – Local Application with Dependency Monitor (Event triggered manual reset):

image

Health Service Watchers (Criticals are offline.  Warning is a Runas account monitor):

image

Last, a class based on an abstract class where the monitor is targeted to the abstract and inherited:

image

So, we have plenty of unhealthy objects in the lab.  Let’s see if we can turn them all green!

Windows Computers:

image

IIS:

image

AD:

image

Custom App – Local Application with Dependency Monitor:

image

Health Service Watcher:

image

Last, a class based on an abstract class where the monitor is targeted to the abstract and inherited:

image

So, all of the monitors turned healthy except for the monitors for the health service watchers where the machines are actually offline.  This is a good thing!  The reason the for this is as follows:

image

None of these are Unit Monitors.  The top two are Aggregate monitors and the last one is a Dependency monitor.  The watchers use different workflows to force the state of these monitors.  This is very atypical but it would be possible to reset the health on these if wanted…except for the last one.

image

The HS Availability monitor is a dependency monitor attached to the general availability monitor that is hosted by the machine that is currently offline.  The monitor is simply going to recalculate the gray agent as unhealthy.\

For the most part, you are only going to want to reset the health of unit monitors and let health rollup as it would normally.  As such, the script used to reset the health of the monitors only touches the unit monitors.

Download the script used here: Custom.Example.MonitorReset.PS1

Basically, the approach is as follows

  1. Get all of the non group classes that reside in the MG
    1. $classes = get-scomclass | where{($_.Singleton -eq $false)}
  2. Iterate through all of the classes and get all of the instances for each class that are unhealthy
    1. $MonitoringObjects = Get-SCOMClassInstance -class $class | where{($_.healthstate -ne ‘Success’) -AND ($_.healthstate -ne ‘Uninitialized’) -AND ($_.IsAvailable -eq $true)}
  3. Iterate through the unit monitors and grab the unhealthy monitors
    1. $Monitors = get-scommonitor -Instance $MonitoringObject -recurse | Where{($_.xmltag.tostring() -eq ‘UnitMonitor’) -AND ($_.target.id.tostring() -eq $class.id.tostring())}
    2. $ToReset = $MonitoringObject.GetMonitoringStates($colMonitors) | where{($_.HealthState -ne ‘Success’) -AND ($_.HealthState -ne ‘Uninitialized’)}
  4. Reset the health
    1. $MonitoringObject.ResetMonitoringState($Monitors)
  5. Let Aggregate and Dependency monitors roll health and clean themselves up

Download the attached code for more in depth comments on how to approach resetting the health of your entire environment or just a subset (more likely).

SC Operations Manager

Leave a Reply