When passing alerts from SCOM onto a downstream system for notification such as ticketing or email, there tends to be two approaches.
- Override and disable all alert traffic that isn’t immediately actionable, or
- Put a process/tool in place to automatically flag the alerts that are meaningful for platform owners and support teams.
Disabling all of the alert traffic that isn’t immediately actionable tends to not be the best approach. First, identifying all of the alert generating rules and monitors can be painstaking. Second, working through creating the overrides would be quite an undertaking. Lastly, this potentially suppresses meaningful traffic from ever entering SCOM. This is especially problematic when some sort of environmental change happens within the platform(s) the SCOM environment is responsible for monitoring. It’s better to at least have the alerts within the SCOM console (even if there is no notification) just to prove out where certain conditions are monitored and hopefully someone sees the traffic anyway. This is definitely the recommended approach.
With the retirement of the bulk of the product connectors that were compatible with SCOM 2007 R2, System Center Orchestrator is the route companies are moving for managing their alert traffic. However, when taking a look at how to put all of this together to manage the traffic, initially the task can seem pretty daunting. Taking a look at the SCOrch IP for SCOM 2012, the best way I can think to describe the IP is comparing it to a bucket of Legos.
And there are really no canned runbooks out there to help integrate SCOM with any other downstream systems specifically from MS. There are blogs and vendor solutions out there if you want something to start with rather than just a blank runbook. While all of the functionality is there within the IPs (assuming there is an IP or scriptable interface for your downstream system), the Legos still need to be plugged together. Here is one such potential solution to get the ball rolling. This is just to get things started and to demonstrate the potential of using SCOrch to flag actionable traffic for notification.
Goal: flag specific alert traffic within SCOM to pass onto a downstream system.
- Monitor SCOM for new alert traffic
- Grab the alert traffic and flag it within SCOM so that it won’t be picked up the next time we look
- See if we care about the alerts
- Flag the alerts to move forward or flag them as being done with processing
For monitoring SCOM for new alert traffic, see the prior post on how to scoop traffic up from SCOM and why you may want to avoid the ‘Monitor Alert’ activity.
Let’s rename those activities to make the runbook more readable:
Ok, nice. So, we are picking the alert traffic up at a regular interval (according to business needs/load) and now flagging them so that they move out of the way the next time the runbook checks. There is one more bit of housekeeping here. The link between Get New Alerts and Flag Alert: Processing should be set like this so that the flag alert activity doesn’t raise a warning every 15 seconds due to no alerts being passed on that iteration:
The next step is to see if the alert(s) we have picked up are meaningful to the support teams. Assuming the teams have been working with their respective MPs and are aware of what they are interested in being notified upon, we know the rules/monitors that raise the alerts we need to pass downstream. What’s the best way to document these alerts so that we can grab them real time? One such approach would be to create a simple database (you can make this HA but putting this within a SQL environment that has failover, mirroring, Always On, etc). Here’s the way the DB could look.
Obviously, some tuning can be done here with the data types and sizing, however, just getting a functional DB in place that represents all of the fields that might be needed for moving the alert forward is necessary.
Now that we have a DB, we need to able to get the data out of the DB and use it flag alerts.
Something like this will do the trick. Looking at the four steps above, these activities should cover what we are trying to accomplish. Keeping the Query Enrichment Database activity as simple as possible, let’s just hardcode what we need into the activity for now. The activity can always be revisited to have variables used and to be further optimized. The query itself is not too wild and crazy.
In this case, the approach is to query the DB to get the enrichment info back for each of the alerts that have been raised. An approach here could be to flatten the ‘Flag Alert: Processing’ activity, return all of the results and let the data bus branch out from the query activity. However, this would mean all of the enrichment data from the DB would have to be brought back and then filtered through for each of the new alerts within the runbook. To keep it simple and assuming our runbooks servers have a tiny bit of horsepower, branching at the ‘Get New Alerts’ activity will be the easiest. In the event of an alert storm and a high amount of enrichment data, this approach may also prove to be the lightest weight. Testing would need to be done in each custom situation to determine for sure.
Ok, so we have the DB results that have been brought back. There are two scenarios (thus two lines) – either we got results or we did not. If we did not, let’s just flag the alert for no ticket.
And if we get a result:
Ok, the ‘Process Query Results’ activity – why do we need it? The data returned by the ‘Query Enrichment Database’ activity doesn’t really provide for a way to parse out the different data that gets returned as a result of the query. In order to split the results up, we need something like a PowerShell script to do the work.
(NOTE: Script is contained within the runbook linked at the bottom of the post)
Very simplistic but it does the trick. Next, we need to publish the data to the data bus from the PS script:
Good stuff. Now we have all of the data from the enrichment DB available for this particular rule or monitor available on the data bus.
Now, we have to deal with the ticket flag. Let’s say we are going to have the following notification scenarios when all is said and done:
- No Ticket – Used for maintenance
- Ticket NOC (forces the ticket to the Network Operations Center)
- Email Only
In this case, we’re just marking the alert so we know it needs to move forward in the notification process.
Similarly, if the rule/monitor happens to have the ticket flag set to 0, we don’t want a ticket on this alert for now so:
We still need to deal with the link from the query activity that leads to no ticket:
Lastly, we need to set our ‘Flag Alert: Notify’ and ‘Flag Alert: No Ticket’ configs. However, before that is done the necessary resolution states need to be created within SCOM:
Any ID numbers can be selected. The SCOM IP actually uses the display name for the resolution state both in front and behind the scenes. Now, we can set the Flag Alert activities:
That should about do it. Now, all we need to do is put some data in our DB and see what happens. In order to get the GUID for the rule or monitor, you need to use something like the SCOM command shell. This can be a repetitive tedious task and so can entering/maintaining the data in the DB. Attached at the end of this post will be the script to create the table in an enrichment DB ( so that the table layout is documented) as well as a script that connects to a SCOM environment/Enrichment DB and presents simple GUI for entering the data. Here’s what it looks like:
We will pick the Blog MP for testing purposes. It has a single rule in it that generates an alert every two minutes.
Simply hit Edit, make the necessary changes, hit save. Once all changes have been made to all of the rules/monitors, hit the commit button and the PS script constructs the SQL statements and commits the changes to the enrichment DB.
Here’s what we are going to set for this example:
Once the Commit button has been hit, the changes have been made to the DB. Now, fire up the runbook and let’s go take a look at SCOM after a few minutes:
Looks pretty good. Let’s take a peek at the Alert History on one of these to see if it makes sense:
That looks right. Let’s check Orchestrator to see if that looks good:
Summary: We have a runbook now that takes alert traffic from SCOM 2012, sets the alert traffic to a processing state within SCOM using the resolution state field, queries an enrichment DB to see if the alert was raised from a rule or monitor we care about and then either flags the alert for additional processing or flags it as complete and moves on. A simplistic start to a SCOrch solution for managing SCOM 2012 alerts, but a start!
Enrichment Script – if used in a specific environment, edit the top of the script to include your SCOM MS server, your enrichment DB as well as if you are dealing with SCOM 2007 or 2012. The area within the script is flagged. Feel free to modify as needed.
UPDATED: Now works with PS version 3
Enrichment Table Create Script – .SQL script. You will need a SQL instance somewhere to contain the table. Make sure you have a username and password that can connect.
Enrichment Runbook (sanitized) – the runbook from above with my environment’s specifics removed.