Jonathan Almquist

Syndicate content
Updated: 3 days 14 hours ago

Alert Grooming - is it working or not?

Tue, 2008-11-25 04:32

I've run across a couple instances where it appeared that alerts were not being groomed from the Operations Manager database.  Part of the confusion was due to querying alerts raised per day by resolution state.  A query such as this will give you a rundown of alerts generated per day, that have not yet been groomed from the OpsDB...along with their current resolution state.  The purpose of this query is not to evaluate whether or not alert grooming is working.

There is one important fact to keep in mind about alert grooming.  Alerts raised beyond, and resolved within, your resolved alerts retention period may mislead one into thinking some alerts should have been groomed.  In fact, they are just waiting for their day to come.

image 

Here's a one-liner to check if alert grooming is working as designed.  Just copy and paste into the Operations Manager Command Shell.

$Threshold = (Get-Date).AddDays(-(get-defaultsetting)[42].Value).Date;Get-Alert | Where {$_.TimeResolved -and ($_.TimeResolved).Date -lt $Threshold} | Measure-Object

If this command returns nothing, this is good.  This means there were no objects returned...hence, grooming has done it's job.  If it returns a result, this would indicate alert grooming is not working properly.

Example: Results returned one alert not groomed on it's last grooming interval.image

The results in the above image indicates a problem with alert grooming.  So, how do we figure out why grooming didn't do it's job?  Well, the alert grooming job is not complicated.  It simply selects alerts that have a resolution state = 255 and TimeResolved <> NULL.  Then calculates, if TimeResolved + Resolved Alert retention days = Today...groom alert.  So, if there is a problem I would first check the InternalJobHistory.

SELECT * FROM InternalJobHistory Order By InternalJobHistoryId DESC

Read more on this here.

MOM 2005 - Remove Reporting Component

Sat, 2008-11-22 16:33

Did you uninstall MOM Reporting?  Did OnePoint grooming cease after uninstalling reporting?  This seems like old news.  But, considering this is the about the time when many organizations are decommissioning MOM and migrating to Operations Manager 2007 (and reporting is often the first to go), I think it's worth a refresher.

The problem is, after uninstalling MOM Reporting the MOMX Grooming and Partitioning job does not run and your OnePoint database continues to grow.  This is because MOM thinks reporting is still installed.  Hence, MOM also thinks the DTS jobs should be running and are failing.  If DTS jobs aren't running, and MOM thinks reporting is installed, data will not groom until MOM knows it has first been transferred to the DW.

This all comes down to a row value in the OnePoint database, in the ReportingSettings table, for the TimeDTSLastRan column.  When MOM Reporting is uninstalled, this row value should have been deleted.  Sometimes this doesn't happen.

Normally, when reporting is installed and working properly, this should equal today’s date in order for the MOMX Grooming and Partitioning job to run.  This field is updated each time the DTS job runs and successfully transfers data to the data warehouse.

However, since we have removed the reporting component, this row will cease to get updated.  If the row is blank, MOM will know that Reporting is not installed and continue to groom.  If this row still has a value after uninstalling MOM Reporting, simply delete the value and you're MOMX Grooming and Partitioning jobs should continue to run as scheduled.

Operations Manager 2007 Command Shell

Mon, 2008-11-17 21:56

Have you started using the Operations Manager Command Shell?  Using PowerShell for Operations Manager tasks can save a lot of time, and helps automate some of those mundane chores we sometimes find ourselves faced with.  Like closing out alerts generated from an alert storm perhaps?  Or pushing agents during a deployment?

I'm starting a list of commands that I've found useful at one time or another.  Most of these are one-liners and can be pasted directly into Command Shell.  Replace any text in red with your input.

Check back now and then, as I'll be adding more examples periodically.

Agent

Approve Manual Installation for single agent

Get-AgentPendingAction | where {$_.AgentName –match “<netbiosname>”} | Approve-AgentPendingAction

Approve Manual Installation for all pending agents

Get-AgentPendingAction | where {$_.AgentPendingActionType –match “ManualApproval”} | Approve-AgentPendingAction

Approve Manual Installation for n number of agents

$i = 1; foreach ($agent in Get-AgentPendingAction | where {$_.AgentPendingActionType –eq “ManualApproval”}) {if ($i -le <n>) {$agent | Approve-AgentPendingAction;$i++}}

**To approve updates to agents, you can use the same commands here. Just replace “ManualApproval” with “UpdateFailed”. This is the Pending Action Type string to match for updates.

Discover and Install agent (not a one-liner)

**This example will configure the agent to report to the Management Server you are currently connected to while performing this action.

$query = New-LdapQueryDiscoveryCriteria -domain <domain> -ldapquery "(cn=<target_netbios_name>)"

$discoverycfg = New-WindowsDiscoveryConfiguration -ldapquery $query

$discoveryResults = Start-Discovery -managementServer (get-managementServer) -windowsDiscoveryConfiguration $discoverycfg

Install-Agent -managementServer (get-ManagementServer) -agentManagedComputer $discoveryResults.CustomMonitoringObjects

Get agent state (Windows Computer Instance)

get-agent | where {$_.computername -eq "<netbios_name>"} | ft name,HealthState

Get group members and contained instance state, by group name

foreach ($group in get-monitoringobjectGroup) {if($group.DisplayName -eq "<group_name>") {$group.GetRelatedMonitoringObjects() | ft DisplayName,HealthState}}

Rule

Which Management Pack contains this Rule?

(get-rule | where {$_.displayname -eq "<rule>"}).getManagementPack() | ft DisplayName,Name -auto

Monitor

Which Management Pack contains this Monitor?

(get-monitor | where {$_.displayname -eq "<monitor>"}).getManagementPack() | ft DisplayName,Name -auto

Alert

New alert count

get-alert | where {$_.resolutionState -eq 0} | measure-object

Open alert count (all resolution states, except new and closed)

$states = 2..254;get-alert | where {$states -contains $_.resolutionState} | measure-object

Closed alert count

get-alert | where {$_.resolutionState -eq 255} | measure-object

Alerts raised on specific date

get-alert | where {$_.TimeRaised.date -eq "<m/d/y>"}

Alerts raised in date range

get-alert | where {$_.TimeRaised.date -ge "<m/d/y>" -and $_.TimeRaised.date -le "<m/d/y>"}

Alert count on specific date

get-alert | where {$_.TimeRaised.date -eq "<m/d/y>"} | Measure-Object

Alert count by date

$array = @();foreach ($date in Get-Alert | foreach-object {$_.get_TimeRaised().toShortDateString()}) {$array += $date};$array | Group-Object | select-object count,name | sort-object name –desc

Top 10 alerts

get-alert | Group-Object Name | Sort-object Count -desc | select-Object -first 10 Count, Name | ft –auto

Last 10 critical alerts (not closed)

get-alert | where {$_.severity -eq "error" -and $_.resolutionstate -ne 255} | sort-object TimeRaised -desc | select-object -first 10 name,timeraised

Top 10 Repeat Count alerts (not closed)

get-alert | where {$_.RepeatCount -gt 0 -AND $_.resolutionState -ne 255} | sort-object RepeatCount -desc | select-object -first 10 repeatcount,name | ft –auto

Resolve all open alerts in date range

get-alert | where {$_.TimeRaised.date -ge "<m/d/y>" -and $_.TimeRaised.date -le "<m/d/y>" -and $_.resolutionState -ne 255} | resolve-alert

Resolve all open alerts, by Alert Name

get-alert | where {$_.Name -eq "<alert_name>" -AND $_.resolutionState -ne 255} | resolve-alert

Resolve all open alerts for specific Agent

get-alert | where {$_.netbiosComputerName -eq "<netbios_name>" -AND $_.resolutionState -ne 255} | Resolve-Alert

Resolve all alerts, by Resolution State

get-alert | where {$_.resolutionState -ne 255} | Resolve-Alert

Class

Get class properties, by class name

get-monitoringclass | where {$_.name -eq "<class_name>"} | foreach-object {$_.getMonitoringProperties()} | select-object name

Get HOST class, by class name

get-monitoringclass | where {$_.name -eq "<class_name>"} | foreach-object {$_.findHostClass()} | select-object DisplayName

Get HOST class properties, by class name (if any)

get-monitoringclass | where {$_.name -eq "<class_name>"} | foreach-object {$_.findHostClass().PropertyCollection} | ft name

Get BASE class, by class name

foreach ($base in Get-MonitoringClass | where {$_.name -eq "<class_name>"}) {get-monitoringclass | where {$_.id -eq $base.base.id} | select-object name}

Get BASE class properties, by class name (if any)

foreach ($base in Get-MonitoringClass | where {$_.name -eq "<class_name>"}) {get-monitoringclass | where {$_.id -eq $base.base.id} | foreach-object {$_.getMonitoringProperties()} | ft -auto parentElement,name}

**Check out my GetClassChain script to view the entire BASE and HOST class chain (to System.Entity), and their properties, for a particular class.

Event

**Querying events tends to be quite resource intensive, given the sheer number of events OpsMgr collects. Even more so if performing foreach loops, sorting and grouping (like my first example).

Event count, by date

$array = @();foreach ($date in Get-Event | foreach-object {$_.get_TimeGenerated().toShortDateString()}) {$array += $date};$array | Group-Object | select-object count,name | sort-object name –desc

Top 10 Events, by Event Number

get-event | Group-Object number | Sort-object Count -desc | select-Object -first 10 Count, Name | ft –auto

Override

All monitors overridden from MP

foreach ($monitor in Get-ManagementPack | where {$_.DisplayName -match "<mp_display_name>"} | get-override | where {$_.monitor}) {get-monitor | where {$_.Id -eq $monitor.monitor.id} | select-object DisplayName}

All rules overridden from MP

foreach ($rule in Get-ManagementPack | where {$_.DisplayName -match "<mp_display_name>"} | get-override | where {$_.rule}) {get-rule | where {$_.Id -eq $rule.rule.id} | select-object DisplayName}

Overrides created in date range

Get-ManagementPack | where {$_.sealed -eq $false} | get-override | where {$_.TimeAdded.date -ge "<m/d/y>" -and $_.TimeAdded.date -le "<m/d/y>"} | fl name,TimeAdded

Overrides that have been modified

Get-ManagementPack | where {$_.sealed -eq $false} | get-override | where {$_.LastModified -gt $_.TimeAdded} | fl name,LastModified

Overrides modified in date range

Get-ManagementPack | where {$_.sealed -eq $false} | get-override | where {$_.LastModified -gt $_.TimeAdded -and $_.LastModified.date -ge "<m/d/y>" -and $_.LastModified.date -le "<m/d/y>"} | select-object name,LastModified

All overrides

Get-ManagementPack | where {$_.sealed -eq $false} | get-override | select-object name,parameter,module,rule,value

Misc

**There is no way to differentiate a Console Session and a Command Shell session (AFAIK).  These next two examples will show connections to SDK.

List users connected to SDK

Get-ManagementGroupConnection | foreach-object {$_.ManagementGroup.getConnectedUserNames()}

Number of users connected to SDK

Get-ManagementGroupConnection | foreach-object {$_.ManagementGroup.getConnectedUserNames()} | Measure-Object

Get agent state (Windows Computer Instance)

get-agent | where {$_.computername -eq "<netbios_name>"} | ft name,HealthState

Get group members and contained instance state, by group name

foreach ($group in get-monitoringobjectGroup) {if($group.DisplayName -eq "<group_name>") {$group.GetRelatedMonitoringObjects() | ft DisplayName,HealthState}}

Find Agent, by Health Service Id

get-agent | where {$_.hostedHealthService.id -eq "<guid>"} | select-object name

Enable Agent Proxying, by Health Service Id

$a=get-agent | where {$_.hostedHealthService.id -eq "<guid>"};$a.set_proxyingEnabled($true);$a.applyChanges()

Alert grooming (evaluate whether alert grooming is working.  Count should = 0)

$Threshold = (Get-Date).AddDays(-(get-defaultsetting)[42].Value).Date;Get-Alert | Where {$_.TimeResolved -and ($_.TimeResolved).Date -lt $Threshold} | Measure-Object

Agent Proxy Alert

Fri, 2008-11-14 00:37

Surely, by now we are all familiar with this alert.

Agent proxying needs to be enabled for a health service to submit discovery data about other computers.

How do I resolve this Health Service Id to an Agent, so I can go ahead and enable Agent Proxying?  Thanks to Marius for supplying the SQL query, and thanks to Kevin for showing us how to do this with a custom report in the Operations Console.  But, if you don't have access to SQL and you haven't yet setup a custom report...there's a quick way to resolve this GUID to an Agent right now.

Open Operations Manager Command Shell.  Copy the first GUID from the Alert Description field (yellow highlight in image), and paste it into the following command.

 

get-agent | where {$_.hostedHealthService.id -eq "3078acb7-c2b6-1d30-8d9c-cdda26df8dbd"} | select-object name

 

image 

Just to make this post complete...

To enable Agent Proxying, run this with the same GUID.

$a=get-agent | where {$_.hostedHealthService.id -eq "<guid>"};$a.set_proxyingEnabled($true);$a.applyChanges()

Monitor Default Management Pack

Wed, 2008-11-12 06:28

A colleague wrote about best practices concerning the Default Management Pack, and how to clean it...if it happens to get messy.  Unfortunately, it seems almost inevitable that the Default MP will become cluttered in time.  This mostly comes down to educating staff on being careful not to save anything to the Default MP.  Hopefully, this mistake will not be so easily made in future releases of OpsMgr...

In Kevin's post, he had an idea of monitoring the Default MP.  Essentially, send a notification or raise an alert if the number of overrides stored in the Default MP should exceed a threshold.  This threshold is a number that you decide is acceptable.  One way to determine your threshold, is to check where you're at now.  Run the following query against the OperationsManager DB:

select count(*) as NumberOfOverrides from AllOverrideView aov
inner join ManagementPackView mpv on aov.managementpackID = mpv.Id
where mpv.name = 'Microsoft.SystemCenter.OperationsManager.DefaultUser'

If you currently have more than 2, you may want to check what is in there and do an initial cleanup.  Given an acceptable number of overrides (let's just say 2), we can set our threshold and generate alerts when this is exceeded.  For brevity, I am attaching a sample MP.  This sample sync's at midnight, and queries the OperationsManager DB for the number of overrides in the Default MP.  It compares this number with the threshold you have set in the rule parameters, and raises a Warning alert in the Operators Console.  Of course, you can expand on this workflow and add a notification or change the severity if you like.

To import my sample Management Pack

If you want to get this going quickly by just importing, I've attached the sample MP.

The script write action rule is not enabled by default.  After importing, go into the Authoring space of the Operators Console.  Click Rules > Scope to Root Management Server > search for string Default > Right-click "Monitor Default MP Write Event" rule > Overrides > Override the Rule > For all object of type: Root Management Server.

Check the Enabled parameter, and set to True.

Check the Arguments parameter, and enter your threshold, Operations Manager database server name, and Operations Manager database name.

Note: Arguments are space delimited.

Example

image

To create the rules in your own custom MP using the attached script

If you want to create these rules in your custom MP, follow these instructions and use the attached script for your script write action rule.

First we'll create a rule that executes the query on the OpsDB, and creates an event in the Operations Manager event log for us to match and generate alerts.  Create a new rule > Timed Commands > Execute a Script.  Save into your custom Management Pack.  Perhaps you have a MP that you've already been creating some custom monitoring or tasks.

01

Enter a rule name and select Root Management Server for your Rule Target.

02

Configure the schedule to run once a day.  Synchronize the rule, so that it will always run at the same time.  If a synchronize time is not specified, the rule will automatically sync to the time when the workflow became active on the agent.  This means the rule will sync at the time it was initially delivered to the agent...and subsequently synchronized to the time when the agent Health Service is restarted.

03

Paste the attached MonitorDefaultMp script into the script pane.  Keep the same file name for consistency.  The filename here doesn't affect functionality of the rule, but the filename within the script is used in event data and needs to match NT Event alert expressions later in this walk-through.  Click on Parameters.

04

Type in your determined threshold, the Operations Manager DB Server Name and the Operations Manager DB Name.  The parameters are space delimited.

05

You're finished with this rule.  Click create and let's move onto the alert generating rule.

This rule will match the event in the Operations Manager event log and generate an alert.  Create a new rule > Alert Generating Rules > Event Based > NT Event Log (Alert).  Save to the same MP.

06

Type in a rule name, select the Alert category and target Root Management Server.

07

Specify the Operations Manager log.

08

Build the expression as follows.  This is where we need to match parameter 1 to our script name.  This script name is actually specified in the log event method from within the script.  If you take a look at the script, you'll see the script name specified.  This is what's being matched.  This is why it's important we keep these script names consistent throughout.  Otherwise, it's more difficult to build an efficient expression.

09 

Lastly, we'll specify the alert information.  I recommend Event Description...that'll give you all the info you'll need.  I specified Warning for severity, as I though it was better suited for this type of alert.

10

Finally, this is the result.

14

This is what you'll see in the Operations Manager event log.

12

If you want to test this and see results quickly, specify a threshold value of 1 in the parameters, uncheck the synchronization time and configure the schedule to run every 1 minute.

If you cannot see the attachments to this post, click on the header of this post and the attachment link should appear at the bottom.