Many SCOM installations have monitors sitting in unhealthy states. This is a good thing if monitored items are in fact in an unhealthy state, someone is looking to take action and the monitors automatically reset once the issue has been resolved. Unfortunately, many times MPs are imported prior to anyone looking at the traffic, some monitors go unhealthy, the alerts get cleaned out for whatever reason, and the results are unhealthy monitors that can only be found using state views and the health explorer. Additionally, there are manual reset monitors which are a beast unto themselves.
Let’s look at resetting the state of all monitors that are in an unhealthy state. This may or may not seem like a reasonable thing to do based on your business needs. You may want to target specific classes or monitors. You may also want to add logic to only reset monitors to a healthy state if they have been unhealthy for a long enough period of time. This is certainly possible using the code attached as the basis for this work.
Let’s take a look at what is unhealthy in my lab environment…
Windows Computers (BPA Monitors and others):
IIS (IISAdmin Service is stopped):
AD (BPA Monitors):
Custom App – Local Application with Dependency Monitor (Event triggered manual reset):
Health Service Watchers (Criticals are offline. Warning is a Runas account monitor):
Last, a class based on an abstract class where the monitor is targeted to the abstract and inherited:
So, we have plenty of unhealthy objects in the lab. Let’s see if we can turn them all green!
Windows Computers:
IIS:
AD:
Custom App – Local Application with Dependency Monitor:
Health Service Watcher:
Last, a class based on an abstract class where the monitor is targeted to the abstract and inherited:
So, all of the monitors turned healthy except for the monitors for the health service watchers where the machines are actually offline. This is a good thing! The reason the for this is as follows:
None of these are Unit Monitors. The top two are Aggregate monitors and the last one is a Dependency monitor. The watchers use different workflows to force the state of these monitors. This is very atypical but it would be possible to reset the health on these if wanted…except for the last one.
The HS Availability monitor is a dependency monitor attached to the general availability monitor that is hosted by the machine that is currently offline. The monitor is simply going to recalculate the gray agent as unhealthy.\
For the most part, you are only going to want to reset the health of unit monitors and let health rollup as it would normally. As such, the script used to reset the health of the monitors only touches the unit monitors.
Download the script used here: Custom.Example.MonitorReset.PS1
Basically, the approach is as follows
- Get all of the non group classes that reside in the MG
- $classes = get-scomclass | where{($_.Singleton -eq $false)}
- Iterate through all of the classes and get all of the instances for each class that are unhealthy
- $MonitoringObjects = Get-SCOMClassInstance -class $class | where{($_.healthstate -ne ‘Success’) -AND ($_.healthstate -ne ‘Uninitialized’) -AND ($_.IsAvailable -eq $true)}
- Iterate through the unit monitors and grab the unhealthy monitors
- $Monitors = get-scommonitor -Instance $MonitoringObject -recurse | Where{($_.xmltag.tostring() -eq ‘UnitMonitor’) -AND ($_.target.id.tostring() -eq $class.id.tostring())}
- $ToReset = $MonitoringObject.GetMonitoringStates($colMonitors) | where{($_.HealthState -ne ‘Success’) -AND ($_.HealthState -ne ‘Uninitialized’)}
- Reset the health
- $MonitoringObject.ResetMonitoringState($Monitors)
- Let Aggregate and Dependency monitors roll health and clean themselves up
Download the attached code for more in depth comments on how to approach resetting the health of your entire environment or just a subset (more likely).